Proteins and Wave Functions: Predicting amide nitrogen (15N) chemical shifts in proteins: where do we go from here?

Sunday, March 15, 2015

Predicting amide nitrogen (15N) chemical shifts in proteins: where do we go from here?

As I wrote over on CCH: "The H-N HSQC spectrum of protein amide groups is one of the most frequently recorded experiments in protein NMR. 15N labeling is comparatively inexpensive and the spectrum can be acquired in a relatively short period of time." In fact I have sometimes seen HSQC spectra used "simply" as a proof that the protein is folded and then "thrown away". But can we extract more information? For example can we deduce the structural changes due to mutations or ligand binding? Can we discriminate between two similar predicted structures as part of protein structure determination?

Unfortunately predicting amide N chemical shifts in proteins appears to be very difficult
Zhu, He and Zhang predicted amide N chemical shifts at the B3LYP/6-31G(d,p) level of theory (using an implicit solvation model) for Protein G and ubiquitin with mean unsigned errors (MUEs) of 4.8 and 8.0 ppm, respectively. And this after a linear fit to experiment, with $R^2$ values of 0.71 and 0.69. For comparison the MUEs for CA are 2.1 and 1.4 ppm, with $R^2$ values of 0.79 and 0.88. This was done using a single AMBER optimized structure.

Exner et al. used 500 structures from a 5 ns explicit solvent MD of the HA2 domain to predict amide N chemical shifts at the B3LYP/6-31G(d). The MUE was 14.6 ppm and the $R^2$ value was 0.81. The corresponding value for C atoms (note: not just CA) is 5.2 and 0.99 and the $R^2$ value for aliphatic carbons is 0.99 (couldn't find separate MUE).

Has an amide N chemical shift ever been predicted accurately from first principles?
Short answer: No - as far as I can tell. A few N chemical shifts have been measured in the gas phase and these can be reproduced to within 1.5 - 2.0 ppm using CCSD(T)/CBS//CCSD(T)/cc-pVTZ. Here vibrational contributions and basis set extrapolation were key with contributions as large as 5 ppm. Unfortunately, none of the molecules contained an amide group.

While amide N chemical shifts have not been measured in the gas phase, there is plenty of data in solution - including data for small prototypical amides such as formamide, acetamide, and N-methyl acetamide that are within reach of CCSD(T)/CBS. However, one conclusion from this data (and similar data for ammonia) is that solvent effects can be very large (5-15 ppm) and that a low dielectric solvent is not representative of the gas phase when it comes to N chemical shifts. So (1) one cannot simply use a continuum representation of the solvent and (2) one might as well go for aqueous solution since that is most relevant for proteins and not necessarily harder than other solvents.

Exner, Möller, and co-workers attempted a N chemical shift-prediction for N-methyl acetamide in aqueous solution based on 5000 explicit solvent CPMD snapshots, but the results for N is ambiguous since predicting the chemical shift requires an equal treatment of the reference, which they didn't do (se more on this below). However, one very encouraging result was that, for example, the H(N) chemical was predicted to be 5.2 ppm higher than the neighboring H(C), which is in excellent agreement with the experimental value of 5.1 ppm. For comparison the corresponding predicted value in the gas phase is 2.7 ppm.

Where do we go from here?
It's important to figure out what it takes to predict accurate amide N chemical shifts in aqueous solution. One option is small model systems and here we either have to deal with the reference problem or look at molecules with more than one N atom. Another is to look more closely at select protein residues. Either way, we need to get some basics straight so we are not fumbling in the dark.

1. The Basics
a. CCSD(T)/CBS + vibrational correction (a la Teale et al.) benchmark N chemical shift values for formamide, acetamide, and N-methyl acetamide. Moon and Case have done this for the latter, but using a MP2/6-31G(d) structure, which I am not sure is good enough.

b. Corresponding benchmark values for the effect of hydrogen bonding (e.g. to water molecules) and dihedral angle changes. Can we assume that the vibrational effects are unchanged?

2. Small models
a. Internal reference. Look for experimental data for small amide containing compounds with only one conformation and no significant pH or tautomer effects. I had a very quick look at some tables and found one (not ideal) candidate: hydantoin. Kricheldorf has measured a difference in N chemical shift of 63.9 ppm in 20% w/w aqueous solution. It not ideal because neither N is strictly speaking in an amide group. Ideal model systems would be substituted diglycolyldiamides - like this study but in aqueous solution.

b. External reference. Exner, Möller and co-worker predicted a N chemical shift for N-methyl acetamide (NMA) of 159.99 ppm, which they compared to an experimental value of 113.8 ppm. However, the computed shift was relative to computed gas phase ammonia, while the experimental value was referenced to liquid ammonia. One solution is to try to make a similar prediction for liquid ammonia. Another is to compare two aqueous phase prediction: e.g. repeat the study for acetamide and compare the two N chemical shifts to experimental measurements.

Yet another approach is this one: the experimental chemical shielding difference between gas phase and liquid ammonia is 19.1, so the predicted chemical shift relative to liquid ammonia is 140.9, which is already a little better. The prediction was done with B3LYP/cc-pVTZ and that certainly contains some error. To find out how much we need CCSD(T)/CBS values which we don't have (Moon and Case don't appear to give the numerical values for NMA). The best I would find so far was MP2/cc-pVQZ which gives a NMA gas phase N chemical shift relative to gas phase ammonia of 114.97 ppm, which is 11.45 ppm lower than the corresponding B3LYP/cc-pVTZ values. So 140.0 - 11.4 = 129.6 ppm is a better estimate. Finally, because of the CPMD the predicted amide chemical shielding included some vibrational averaging, but ammonia does not. Teale et al. predict that vibrational effects lowers the chemical shield of ammonia by 8.7 ppm, so 129.6 - 8.7 = 120.8 ppm, which is starting to approach the experimental value of 113.8 ppm. If we had the CCSD(T)/CBS value and vibration corrections for NMA it is not inconceivable that we could get closer to experiment.

The corresponding result for NMA based on the approach applied typically applied to proteins (classical MD using fixed bond lengths and B3LYP/6-31G(d) chemical shifts) is 77.7 ppm, so the CPMD study already tells us something about why this usual approach taken for proteins might fail.

3. Select protein residues

I also think that the approach we took for amide protons could yield some insights for amide N atom: i.e. build some detailed structural models (including high level geometry optimizations) of protein regions with well defined secondary structure and (at least initially) as far away from the solvent and charged groups as possible. De Dios, Pearson and Oldfield took a similar approach and got promising results for relative amide N chemical shifts of select Val residues in SNase. If we consistently can get accurate (e.g. within 2 ppm) predictions for amide N chemical shifts, we can use the approach to try to understand the outliers.