Proteins and Wave Functions: Manuscript in progress: Protein structure validation and refinement using amide proton chemical shifts derived from quantum mechanics

Wednesday, January 2, 2013

Manuscript in progress: Protein structure validation and refinement using amide proton chemical shifts derived from quantum mechanics

+Anders Steen Christensen and I are working on the first draft of this manuscript. Here is the story as I see it now (stay tuned for updates).

The ProCS method
We introduce the QM-based ProCS method for predicting backbone amide proton chemical shifts given a protein structure. It is a generalization and improvement of a method I published earlier.

Reproducing QM chemical shifts
ProCS is parameterized based on B3LYP/6-311++G(d,p)//B3LYP/6-31+G(d) calculations. This level of theory has been shown to yield reliable proton chemical shifts (how reliable for small molecules). ProCS involves several terms from backbone dihedral angles, hydrogen bonding, etc. with an additivity assumption. If everything works perfectly we should be able to reproduce B3LYP/6-311++G(d,p)//B3LYP/6-31+G(d) chemical shifts for a protein structure.

Anders has computed the B3LYP/6-311G(d,p)/PCM chemical shielding tensors for the 1ET1 x-ray structure. (The lack of diffuse functions is taken care of by scaling factor from literature). ProCS can reproduce QM with an RMSD of 0.37 ppm. Remaining deviation must come from non-additivity.

The empirical methods do not agree well with the DFT results. The DFT CSs span a relatively large range while the empirically predicted CSs span a very short range. This indicates that the empirical methods are less sensitive to small differences in hydrogen bond geometry.

Reproducing experimental chemical shifts from X-ray structures
QM reproduces small molecule H1-CSs with RMSD of xx. However, for 1ET1 the RMSD is yy. The main source of the discrepancy are likely inaccuracies in the x-ray structure HB-bond lengths since there is an exponential dependence.

The PROCS RMSD (0.63) is similar to the QM RMSD(??), and significantly larger than the RMSD between QM and ProCS, indicating that ProCS sufficiently accurate. Going to 13 other proteins the RMSD for ProCS is double that of 1ET1, because more amide protons not involved in HBs.

The RMSDs for the empirical methods are significantly smaller than ProCS. This is also found for 13 other X-ray structures. This is because the empirical methods are parameterized using x-ray structures. In order for these methods to produce low RMSD relative to experiment they need to be insensitive to errors in protein structure.

Monte-Carlo implementation of protein structure refinement based on chemical shifts
Why MC? No gradients for ProCS. More efficient sampling based on Bayesian inference. Anything else?

Refining protein structures based on chemical shifts

We refine the structure of three proteins each based on three energy function: OPLS alone, OPLS+ProCS and OPLS+Camshift. The MC-refinement of a protein structure results in an ensemble of X (?) protein structures from which average chemical shifts for each amide proton are computed.

These average ProCS chemical shifts are in better agreement with experiment compared to using x-ray. The hydrogen bond geometries in the ensemble are very close to the x-ray structures. These average Camshift chemical shifts are in not in better agreement with experiment compared to using x-ray. The hydrogen bond geometries in the ensemble are longer than in the x-ray structures. This is because OPLS prefers longer bond lengths.

Trans-hydrogen bond coupling constants

Better agreement with x-ray structures does nto necessarily imply better solution-phase structures, so we computed average trans-hydrogen bond coupling constants and compared to experiment. The coupling constants based on the Pro-CS refined ensembles are indeed in better agreement with experimental values indicating the refinement led to improved hydrogen bond geometries

Summary

ProCS is a QM-based backbone amide proton chemical shift predictor that can deliver QM quality CS predictions for a protein structure in less than a second. Agreement with experiment is worse compared to empirically predicted CS, but we show that this is because empirical CS predictors are insensitive to small errors in hydrogen bond geometry in the x-ray structures. This is because they are parameterized using such x-ray structures. The agreement between ProCS CSs and experiment can be improved by refining the protein structure using an energy function that includes a force field term and a CS term. This also results in better predicted trans-hydrogen bond coupling constants indicating that the refined protein structures indeed have improved. A similar refinement using Camshift CS lead to worse protein structure by the same criterion.

Empirical CS-predictors result CSs that agree better with experiment. However, they are relatively insensitive to protein structure and are therefore not suitable for protein structure refinement.