Processing math: 100%

Thursday, July 21, 2016

Finding disordered residues in an NMR ensemble

Note to self: here's how you identified disordered residues in the NMR ensemble 2KCU.pdb

1. In Pymol: "fetch 2kcu"

2. Action > align > states (*/CA)
2016.08.07 update: the above command also aligns the tails.  Use "intra_fit (2kzn///6-158/CA)"

3. "save 2kcu_aligned.pdb, state=0"

4. In terminal: grep CA 2kcu_aligned.pdb > lis

5. python disorder.py

disorder.py (given below) calculates the standard deviation of the x, y, and z coordinate of each CA atom (\sigma_{x,i}, \sigma_{y,i}, \sigma_{z,i}). It then averages these three standard deviations for each CA atom (\sigma_i).  To find outliers, it averages these values for the entire protein (\langle \sigma_i \rangle) and computes the standard deviation of this average (\sigma_{\langle \sigma_i \rangle}). Any residues for which \sigma_i > \langle \sigma_i \rangle + \sigma_{\langle \sigma_i \rangle} is identified as disordered.

Here I've colored the disordered residues red (haven't updated the picture based on Step 2-change yet)



import numpy as np
models = 20
res = 166
filename = "lis"
file = open(filename,'r')
ave = [[0 for x in xrange(3)] for x in xrange(res)]
stdev = [[0 for x in xrange(3)] for x in xrange(res)]
coord = [[[0 for x in range(3)] for x in xrange(res)] for x in xrange(models)]
#print coord
for i in range(models):
for j in range(res):
line = file.readline()
words = string.split(line)
coord[i][j][0] = float(words[6])
coord[i][j][1] = float(words[7])
coord[i][j][2] = float(words[8])
for k in range(3):
ave[j][k] += coord[i][j][k]/models
for i in range(models):
for j in range(res):
for k in range(3):
stdev[j][k] += (coord[i][j][k]-ave[j][k])**2/models
ave_stdev = []
for j in range(res):
temp = 0.0
for k in range(3):
stdev[j][k] = math.sqrt(stdev[j][k])
temp += stdev[j][k]
# print j, ",", temp/3.0
ave_stdev.append(temp/3.0)
#tails are removed
start_res = 5 #actually 6 but we count from 0
end_res = 158 #actually 159 but we count from 0
v = np.asarray(ave_stdev[start_res:end_res])
for j in range(start_res,end_res):
if ave_stdev[j] > np.mean(v)+np.std(v):
print j+1
view raw disorder.py hosted with ❤ by GitHub
Yes, I know: "the 1970's called and want their Fortran code back". How very droll.



This work is licensed under a Creative Commons Attribution 4.0

No comments: