Sunday, June 26, 2016

Planned papers for 2016 - six months in

Pedro's post reminded me that mine was due

In January I wrote about the papers I plan to publish and made this list:

Submitted
None

Probable
1. Benchmarking of PM6 and DFTB3 for barrier heights computed using enzyme active site models.
2. pKa prediction using PM6 - part 1
3. Protein structure refinement using ProCS15 - starting from x-ray structure

Maybe
4. PM6 for all elements in GAMESS, including PCM interface
5. Protein structure refinement using ProCS15 - starting from 5 Å Cα RMSD
6. Vibrational effects on N amide chemical shifts
7. pKa prediction using PM6 - amines
8. Predicting binding free energies for CB7
9. Linear scaling HF-3c calculations by interface to FMO2 in GAMESS
10. Side chain chemical shift prediction with ProCS15
11. Rienstra-like chemical shift assignment in PHAISTOS

The status is

Published 

Submitted
Received "minor revision" verdict.  Sent back in last week.

Probable
3. Protein structure refinement using ProCS15 - starting from x-ray structure
Actively working on it. First draft about 2/3 done. It's a huge amount of work because I'm still new to the field and learning as I am writing. However, I am fairly confident that I'll get it published in 2016.

7. pKa prediction using PM6 - amines
The main issue here is whether we can automate all steps of the pKa calculation and based on what we have learned so far I am pretty sure we can. The main issue is the protonation. Once paper 3 is done I will start working on this.  The CPU requirements are not an issue so I am fairly confident that I'll get it published in 2016.

Maybe in 2016
*. Improved prediction of chemical shifts using machine learning
This wasn't even on the drawing board in January.  Lars spent a few months in Anatole von Lilienfelds lab working on increasing the accuracy of the ProCS15 data set  using machine learning. Calculations are still ongoing so I am a little hesitant to list it under "probable".
A companion paper in Scientific Data has also been discussed, 


Probably not in 2016
4. PM6 for all elements in GAMESS, including PCM interface
Jimmy is working on the first part.  The interface is there but debugging is really tough.  I think we will get it working in 2016 but getting a paper published this year is unlikely.

5. Protein structure refinement using ProCS15 - starting from 5 Å Cα RMSD
I think we will get most of the calculations done in 2016.

6. Vibrational effects on N amide chemical shifts
I had a visiting student working this. Bottom line: still no bug-free, black-box approach for computing vibrational that just works.  Much harder problem than I had anticipated.

8. Predicting binding free energies for CB7
Have some data but a long way to go yet.

9. Linear scaling HF-3c calculations by interface to FMO2 in GAMESS
This is actually working and being incorporated into the official version of GAMESS.  Not sure when we'll get around to generating data for a paper.

10. Side chain chemical shift prediction with ProCS15
Susanne is working on this but I doubt we will publish a paper on it in 2016. 

11. Rienstra-like chemical shift assignment in PHAISTOS
Still just an idea so far.



This work is licensed under a Creative Commons Attribution 4.0

Thursday, June 16, 2016

Reviews for Prediction of pKa values using the PM6 semiempirical method

2016.06.21 Update: Here's our response

Reviews of our latest PeerJ submission is in after only 15 days.  This must be some kind of record!

Personal comments from the editor:

Table 4 shows dramatic differences between PM6-D3H+ and PM6 although the previous tables did not show very large differences between both semiempirical methods. Please discuss this.

How do the errors in PM6 or PM6-D3H+ gas-phase protonation energies (vs. experiment or high level computation) change when moving from primary to secondary and tertiary amines? I believe that the addition of a table with these data (with each tested amine treated separately) would be very helpful for the readers and future practitioners.


Reviewer 1 (Anonymous)

Basic reporting

This is an interesting manuscript, but a frustrating aspect is that the experimental pKa values used for comparison are not included for most compounds. These could easily be added to Table 1. In fact, the best solution would be to modify Table 2 to give the calculated pKa values from the various methods along with the experimental values. Also, the authors should refer somewhere to the very relevant PM6 pKa calculations by Jimmy Stewart given in http://openmopac.net/pKa_table.html.

Experimental design

In general this work is properly designed.

Validity of the findings

It would be very helpful if the authors would provide a figure comparing the calculated and experimental values, and include in the text the relevant equation with proper statistics (n, r2, s, F) along with the uncertainties for the slope & intercept. (See, e.g., the book by Shields & Seybold on this topic, or their WIRES article.)

Comments for the Author

After improvements, this manuscript will be of interest to many people attempting to calculate pKas, especially those dealing with high throughput applications.


Reviewer 2 (Anonymous)

Basic reporting

The paper is well-written and organised in a manner that was easy to read. I did find the background / literature research on the short side. Specifically, the isodesmic or proton exchange scheme was developed quite some time ago by various groups . See for example: (a) http://dx.doi.org/10.1063/1.1337862 (b) 10.1021/ct800335v and (c) 10.1021/jp107890p. These studies have laid out quite clearly the effectiveness of an isodesmic scheme for error cancellation, as well as its limitations (e.g. the need for a structurally similar reference with accurately known pKas). Another minor point is there should be a footnote to explain what "**" in Table 2 means.

Experimental design

The research question is well-defined, namely whether contemporary semi-empirical methods can provide cost-effective predictions of pKas. I do have a number of suggestions for improvement:

(1) Computational methods: It was not clear how the solvation free energies were computed - e.g. were these done on gas phase or solution phase optimised geometries? Strictly speaking, the gas and solution phase components of the solvation free energy should be computed on geometries optimised in the respective phases. How sensitive are the results to this choice?

(2) There is a lot of data condensed into the Tables which could actually be used to provide even deeper insights. For example, I would love to see a breakdown of the solution phase energies into the gas phase and solvation contributions as laid out in eqn (6). This would be useful for identifying the sources of errors especially for the outliers.

(3) The dataset molecules in Table 1 are structurally very similar (the substituents are mostly aliphatic groups). It would be interesting to see a more diverse selection of molecules (e.g. EWG and EDG) as the authors alluded to in their conclusion.

Validity of the findings

I think the conclusions are fair based on the results presented. However, I do recommend the authors consider my earlier suggestions to provide clearer insights as to why semi-empirical methods can sometimes fail badly even for isodesmic reactions. This will spur further research into improving these methods.


Reviewer 3 (Anonymous)

Basic reporting

- Line 132: change "can play and important role" to "can play an important role"

- Citations need to conform to the journal style thoughout: see, e.g., "taken from (Morgenthaler et al., 2007)" in Table 4 caption should be changed to "taken from Morgenthaler et al. (2007)"

- References in the bibliography need to be consistently formatted to journal guidelines

- Other groups have reported validation efforts for predicting pKa values using the PM6 method (see, e.g., Rayne et al. [2009], Juranić [2014], etc.). The authors should cite and incorporate the findings of all these prior PM6 pKa validation efforts into the current study to demonstrate an understanding of the prior literature in this field. Otherwise, it looks as though the authors are attempting to make their research appear more novel than it actually is.

Experimental design

- Experimental design is appropriate.

Validity of the findings

- The findings appear valid.