Showing posts with label reviews. Show all posts
Showing posts with label reviews. Show all posts

Thursday, February 21, 2019

Reviews of Graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space

Here are the reviews of my latest paper which just appeared in Chemical Sciences. I submitted the paper December 1, 2018 and got these reviews on January 13, 2019. I resubmitted January 20, and got the final decision on February 8. As usual with Chem. Sci. a very efficient and positive experience. Kudos to Geoff Hutchison for signing the review (and being cool with me sharing it here) and kudos to Chem. Sci. for passing it on to me.

REVIEWER REPORT(S):
Referee: 1

Recommendation: Revisions required

Comments:
The main question of this paper is whether GA-based algorithms perform better than deep-learning. This paper includes interesting comparisons and free-to-use software, which is potentially a good resource for the AI-chemistry community. Yet I have following reservations about this paper.

1) Essentially GA-based method is faster than ML, creating more molecules. The logP computation is extremely fast, allowing GA to create many molecules in a fixed period of time. When simulation takes a longer time (like DFT), it may be beneficial to use more time to design. It is necessary to give a comparison in terms of the number of simulations needed to obtain good molecules. In that case, ML would be better, because it might be creating "high-quality" molecules using more time. Please provide comparisons with this respect.

2) It seems like the author tuned GA parameters such that the molecules are "realistic looking". It would be beneficial to readers if you can elaborate on this aspect. What exactly did you mean by "realistic looking"? Can you quantify somehow?

3) It seems to me that GA crossover parameters are inspired by chemical reactions. Can you claim that GA-based molecules are more synthesizable than deep-learning-based ones?


Referee: 2

Recommendation: Accept

Comments:
The manuscript “Graph-based genetic algorithm and generative model Monte Carlo tree search for the exploration of chemical space” by Jan Jensen is an excellent addition to recent work on using computational methods to generate new molecular compounds for target properties.

I will admit up-front that I am a proponent of GA strategies, so the conclusions were not surprising. I think the work should be published but would like to make some minor suggestions that I think will strongly improve the work.

- On page 2, the number of non-H atoms is described coming “from a distribution with a mean 39.15 and a standard deviation of 3.50” - this is a number without a unit. Based on the article, I think it should read “a mean of 39.15 atoms, with standard deviation of 3.50 atoms”
- The last paragraph on page 3 is perhaps a bit technical for the Chemical Science audience, discussing “leaf parallelization” and “leaf nodes.” I think the whole paragraph needs to be written for a general audience (i.e., not someone implementing a MCTS code) or moved to the supporting information. The code is, after all, open source and available.
- The penalized logP score could be described better. From the text, the penalty for “unrealistically large rings” was not described.
- The J(m) scores in Table 2 could perhaps be a bit expanded. For example, my assumption is that the SA scores and/or ring penalities may be higher in some methods than others. I think it would be useful to add columns for the raw logP, SA, and penalty scores - if not in the text then in the supporting information.
- The results and discussion could benefit from a figure indicating the rate of improvement with generations for the GA methods and/or the GB-GM-MCTS methods. We have, for example, shown that GA methods show high rate of improvement in early generations, but finding beneficial mutations slows over time. This would likely explain why the lower mutation rate shows better performance in this work - and moreover suggest an “early stop” (e.g., are all 50 generations needed for this problem?)
- Similarly, I find it strange that the author didn’t try longer runs or attempt to find an optimal mutation rate for the GA, particularly if the CPU time is so short.
- The caption for Table 3 includes a typo - I believe “BG-GM” should read “GB-GM”
- The conclusions suggest that the GB-GA approach “can traverse a relatively large distance in chemical space” - the author should really use similarity scores (e.g., a Tanimoto coefficient using ECFP fingerprints or similar) to quantify this - again, the discussion could be expanded.

Overall, I think it’s a great addition to the discussion on optimization of molecular structures for properties.

-Geoff Hutchison, University of Pittsburgh


Tuesday, September 11, 2018

Reviews of Solvation Energy Predictions Using The SMD Solvation Method and Semiempirical Electronic Structure Methods

Really late posting this. The paper is already out at JCP. Here are the reviews for the record.

Reviewer #1 Evaluations:
Recommendation: Revision
New Potential Energy Surface: No

Reviewer #1 (Comments to the Author):

In this contribution, the authors report a set of systematic analyses of semi-empirical (NDDO and DFTB) methods combined with continuum solvation models (COSMO and SMD) for the description of solvation free energies of well-documented benchmark cases (the MNSOL dataset). They found that the performance of NDDO and DFTB continuum solvation models can be substantially improved when the atomic radii are optimized, and that the results are most sensitive to the radii of HCNO. Another interesting observation is that the optimized radii have a considerable degree of transferability to other solvents.

Since an efficient computation of solvation free energies and related quantities (e.g., pKa values) is valuable in many chemical and biological applications, the results of this study are of considerable interest to the computational chemistry community.

In the current form, the ms can benefit from further discussion of several points:

1. The authors chose to optimize the atomic radii based entirely on element type (e.g., HCNOS). In the literature, many solvation models either further consider atom types (e.g., UAKS) or atomic charge distribution (based on either atomic point charges or charge density); in many cases, a higher degree of accuracy appears to be obtained. It would be useful to further clarify the principle behind the current optimization and the expected level of accuracy; for example, to what degree should we expect the same set of radii to work well for both neutral and ionic (especially anionic) species?
2. Although it is well known - it is useful to explicitly point out that the experimental values for neutral and charged species have different magnitudes of errors.
3. It would be informative to further dissect/discuss the physical origins for the errors of NDDO/DFTB continuum solvation models. For example, are the larger errors (as compared to, for example, HF based calculations) due primarily to the less reliable description of the solute charge density (e.g., multipole moments) or solute polarizability? Discussion along this line might be relevant to the transferability of the optimized model to non-aqueous solvents.

4. Cases with very large errors deserve further analysis/discussion - for example, some neutral solutes apparently have very large errors at HF, NDDO and DFTB levels - as much as 20 or even 30 kcal/mol! What are these molecules? Are the same set of molecules problematic for all methods? What is the physical origin for these large errors?


Reviewer #2 Evaluations:
Recommendation: Revision
New Potential Energy Surface: No

Reviewer #2 (Comments to the Author):

In this paper, the authors make the case for efficient solvation models in
fast electronic structure methods (currently heavily utilized for high-throughput
screening approaches). They extend an implementation of PM6 in the Gamess
programm to account for d orbitals. The SMD and COSMO continuum models in combination with
various semi-empirical NDDO and also DFT tight-binding approaches is considered.
Their analysis clearly highlights deficiencies of the semi-empiricial approaches
compared to HF/DFT. The authors then proceed to propose a remedy (changing the
radii for H, C, O, N, and S). Although this change was driven by data on aqueous
solvation energies, the authors find that other polar solvents (DMSO, CH3CN, CH3OH)
are also improved, which is a sign of transferability of this simple fix.
The prediction of pKa data, as an important application field, concludes the
results section. The paper is clearly written, however, it raises questions that
should be addressed in a revision:
1) Table 2 shows very (too) small Coulomb radii for H and on page 6 this is commented on.
The authors note that for radii smaller than 0.6 A the proton moved into the solvent.
However, no further analysis if provided. I assume that this is due to an increased outlying
charge and this outlying charge shoud be quantified. Apparently, some error compensation
is in operation. This also relates to the statement 'error for the ions is considerably larger
than for neutral molecules' on page 5. Error compensation also raises a concern about
transferability that the authors must address.
2) The authors should also review their list of references (I assume that the first author's
surname in Ref. 1 is misspelled, the abbreviations of journals are not in JCP style, Ref. 34
appears to lack a journal title, Ref. 16 lacks author names ...).
3) Moreover, figures 1-3 lack a label for the y-axis, figure 4 lacks units
on the y-axis.
4) Few typos need also be removed (see, e.g., "mainly only" -> "mainly on"
on page 2).


This work is licensed under a Creative Commons Attribution 4.0

Tuesday, March 6, 2018

Reviews of Random Versus Systematic Errors in Reaction Enthalpies Computed Using Semi-empirical and Minimal Basis Set Methods

We submitted this paper to ACS Omega January 31st and the reviews just came back

Reviewer: 1

Recommendation: Publish after minor revisions.

Comments:
The authors explored the CBH approach proposed by Sengupta and Raghavachari to compute the reaction enthalpy of a series of organic reactions using semi-empirical and low-cost HF/DFT as the low-level method. They also discussed the origin of errors for several cases that exhibited very large errors. The results will be very useful to the computational chemistry community, in terms of identifying effective means to compute reaction energetics and better ways to improve low-cost methods.

I have only a few minor comments:
1. It appears that dispersion correction was not included for DFTB3 and some NDDO methods. For reactions that involve very large molecules, dispersion may make a non-negligible contribution, as found, for example, for Diels-Alder reactions in the recent benchmark analysis by Gruden et al. (J. Comp. Chem. 38, 2171-2185).

2. There are several typos: line 55 of pg 2, "corrections WERE not included"; line 48 of pg 6, there is one additional "is".

3. It might be useful to report and comment on the computational cost for the different approaches. For example, PBEh-3c is still rather expensive compared to the semi-empirical methods.

4. Is there a "simple" explanation for the difference between xTB and DFTB3? For example, does the improved description of frequencies by xTB make a major difference?


Reviewer: 2

Recommendation: Publish after minor revisions.

Comments:
The authors have carried out an analysis of the performance of highly efficient computational methods for the computation of enthalpies of organic reactions using the connectivity-based hierarchy. The methods considered include DFT, HF, and a range of semi-empirical methods. The analysis is clear and some of the reported findings are indeed significant. While the good performance of DFT and HF is consistent with previous results, the lack of significant improvement with semi-empirical methods is particularly noteworthy. The paper is acceptable for publication after the authors address the following comment.

A more detailed analysis is reported for reaction 19 that is an outlier for some methods such as HF. In this system, the larger errors are attributed partly to the presence of the strained oxirane ring. Similarly, reaction 23 poses problems for some semi-empirical methods due to the presence of larger errors involving allene. In light of these observations, it may be useful to add a cautionary note to the range of problems that can be studied with such methods. I suggest a small paragraph to address the potential limitations of the inexpensive methods for such systems containing unusual bonding situations.


This work is licensed under a Creative Commons Attribution 4.0

Saturday, January 7, 2017

Reviews of Prediction of pKa values for drug-like molecules using semiempirical quantum chemical methods

I have been remiss in posting reviews of my papers. I submitted the paper to Journal of Physical Chemistry A on November 2, 2016, received first round of reviews November 29, and second round of reviews December 12.  The paper was accepted January 5, 2017 and has appeared online.

Round 1
Reviewer(s)' Comments to Author:

Reviewer: 1

Recommendation: This paper is not recommended because it does not provide new physical insights.

Comments:
This is an interesting study on very important subject - prediction of pKa for drug-like molecules. Standard free energy of a molecule is determined as the sum of heat of formation/electronic energy and solvation free energy and these terms are obtained by various semiempirical QM (SQM) methods and two continuous solvent models. Author used SQM methods as a black box and compared them on the basis of their performance to predict pKa. This is, however, not justified since the SQM methods used described differently system under study. For example, PM6-DH+ describes well H-bonding and dispersion energy contrary to e.g. PM3 and AM1. Consequently, structures stabilized by H-bonding and dispersion will be described much better by the former method. Further, PM7 was parametrized to cover dispersion in core parametrization, contrary to PM6 (and PM3) where it should be included a posteriori by e.g. DH+ term. Consequently, PM7 should be also better suited than, e.g. PM6. The question arises how good those methods work and here performance of these methods should be compared with some higher-level method like DFT.

Further, SQM methods were in the last 5 years already used for protein - ligand interactions but these papers were not mentioned at all.

On the basis of above-mentioned arguments I cannot recommend the paper for publication in JPC.

Reviewer: 2

Recommendation: This paper is publishable subject to minor revisions noted.  Further review is not needed.

Comments:
This is simply excellent work on an important topic. The only thing is that the author could put the importance of his work in an even greater perspective. Semi-empirical methods are becoming increasingly important also in materials science and the pKa is of high importance also in this field, as it is a good indicator of general chemical stability (like it is used in organic chemistry) of molecular (especially organic) materials for technical applications. A recent example is the search for new organic electrolyte solvents for Lithium-air battery devices, where current design principles strongly rely on pKa values (see for instance http://pubs.rsc.org/en/Content/ArticleLanding/2015/CP/C5CP02937F#!divAbstract ).

Round 2
Reviewer(s)' Comments to Author:

Reviewer: 1

Recommendation: This paper is not recommended because it does not provide new physical insights.

Comments:
Since the ms was not modified according my comments I cannot recommend it for publication.


Reviewer: 3

Recommendation: This paper is publishable subject to minor revisions noted.  Further review is not needed.

Comments:
This paper evaluates a number of semi-empirical quantum mechanical (SQM) methods for their suitability in calculating the pKa’s of amine groups in drug-like molecules, with the hope that these methods can be used for high-throughput screening.  This paper is suitable for publication in the special issue, subject to minor revision.

(a) The paper shows that pKa’s calculated by some SQM methods is sufficiently accurate for high-throughput screening.

(b) Indicate the accuracy of related QM calculations (e.g. Eckert and Klamt) and the relative cost of QM vs SQM calculations (order of magnitude will do)

(c) How much better is the SQM approach than the empirical methods cited by the author? (add a comparison in the tables)

(d) The need for 26 reference compounds for 53 amine groups in 48 molecules is disturbingly high (so much so that the null hypothesis has errors only a factor of 2 larger than the best results). What are the errors in the SQM calculated pKa’s if a much smaller number of reference compounds are used? (e.g. 6 or less)  If the errors are acceptable, this could make it possible to automate the procedure so that it could be used to screen larger sets of molecules extracted from typical industrial databases (10,000 – 10,000,000 compounds).



This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Reviews of Protein structure refinement using a quantum mechanics-based chemical shielding predictor

I have been remiss in posting reviews of my papers. I received this review on November 11, 2016 of a manuscript I submitted to Chemical Science on September 29, 2016.  The paper was accepted November 17 and has appeared online.

REVIEWER REPORT(S):
Referee: 1

Recommendation: Accept

Comments:
Review of 'Protein Structure Refinement Using a Quantum Mechanics-Based
          Chemical Shielding Predictor'


The authors present a method to refine protein structures with respect to
chemical shifts evaluated by their QM-based ProCS15 method. First applications
to a set of different protein structures showed that small structural changes
lead to a significant reduction of the RMSD.

Empirical methods to predict NMR shifts have shown to be able to deliver
results that correlate well with experimental at almost no computational cost,
in particular in comparison with quantum chemical methods. However, these
methods are also insensitive with respect to structural changes of the
molecular structure. In this work, the authors analyse their empirical ProCS15
method, which is parametrized based on quantum chemical reference calculations,
with respect to structural changes in the molecular geometry. First examples
show that their method has a similar high sensitivity with respect to structure
changes as quantum chemical methods. The results indicate that ProCS15 can
hold a 'predictive power' beyond previous empirical methods, i.e., in
applications to more exotic molecular geometries and conformations.

The manuscript is well written and of appropriate length, and certainly of
great interest for the readers of Chemical Science. The presented applications
have been thoroughly analyzed and results are well outlined for the reader.
Since I've have only a few comments/suggestions, no further revision prior to
publication is necessary. However, I would strongly suggest to consider my
suggestion on the ordering of sections (see below).

Comments:

+ My main point is actually regarding to the order of sections in the
 manuscript.  Since the different methods used are constantly refered to in
 the result-section, I would recommend to first outline the
 theory/computational methodology and then present the results of the
 illustrative calculations on the test systems.

+ In the summary, the authors mention that their method might be used to
 improve the accuracy of QM or QM/MM calculations of NMR chemical shifts.
 It is certainly difficult to judge the quality of the ProCS15-optimized
 structures objectively, i.e., without refering to secondary properties like
 NMR shifts. However, it would be interesting to see the impact of the
 structural changes in quantum chemical calculations.
 This point might be beyond the scope of this work, but is certainly worthwile
 to be considered by the authors as a future project.

+ Just a comment on the DFT-based reference calculations used to parametrize
 the ProCS15 method: It might be worthwile considering the use of the KT2
 functional by Keal and Tozer [JCP 119, 3015 (2003)] and the basis sets
 pcS-x/pcSseg-x by Frank Jensen [JCTC 4, 719 (2008):JCTC 10, 1074 (2014)].
 Both functional and basis sets are optimized for NMR chemical shift
 calculations. A benchmark of those method was done by Flaig et al. [JCTC 10,
 572 (2014)].


This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Monday, July 11, 2016

2nd Reviews for Prediction of pKa values using the PM6 semiempirical method

2016.07.15: Update: our rebuttal can be fund here.  Manuscript resubmitted.

The 2nd round of reviews on our latest PeerJ submission came in on July 7th.  The first round of round of reviews and a link to our response can be found here.

Editor's Comments
MINOR REVISIONS
Thank you for your efforts at addressing the reviewer's comments. In spite of that, the reviewers (and myself) still think that additional data should be moved from the Supporting Information to the main text as tables/graphs. Specifically:

-per reviewer 1's request, please include the ref. pKa data in table 1. The Supporting Material deposited in figshare is very complete, but it will enormously help the reader (and make your paper much more persusive at first reading) if the most salient pieces were included in the paper itself. 

- contra reviewer 1's comment, I do acknowledge that p.799 of the quoted Stewart (2008) reference describes the pKa computation procedure which generated the data present in http://openmopac.net/pKa_table.html. This method (also used by Rayne) as well as the method by Juranic, however, do not computes pKa from the energy difference itself, but from an empirical fit of the O-H bond distances and approximate charges (or N and H charges, plus a dummy variable stating whether the amine is primary, secondary or terciary, for Juranic, 2014). These pKa computation approaches are therefore fundamentally different from the one used in your paper. Your references to this literature in the introduction, however, do not make this clear enough. Please improve this to clearly compare the competing methods for PM6-based pka computations to the your approach.

-Do include the statistical data regarding slope, R-squared and outliers. A motivated reader may easily graph the data you have computed (and which are present in the spreadsheet referred to in your figshare area), but your explanation and discussion would be much more readable, and certainly more persuasive, if you included those graphs, slopes and correlation coefficientes in the paper. That analysis shows more clearly than the aggregat tables exactly where PM6 affords better correlation/slope that even CBS-4B3 (pyridines), the identity of the outliers, how poorly all methods (even CBS-4B3) correlate to experimental pKa in amines (in spite of a seemingly low 0.2 MAD for CBS-4B3), etc.

-per reviewer 2's request (and also related to my previous request which I may not have worded clearly) please add data regarding the likely origin of the errors in the outliers: do they come from gas phase energies or the solvation? A simple comparison of the B3LYP gas-phase energy changes (on PM6-optimized geometries, to reduce computational effort) and solvation effects might be enough to tell whether the gas-phase acidities (and/or solvation) of PM6 generally track the DFT results.

Reviewer 1 (Anonymous)

Comments for the Author
Reviewer Comments – Reviewer 1
The authors have not adequately responded to any of the concerns raised in my original review. My original comments are shown first. The authors’ responses are shown next; and my further responses to them are shown below.

(1) Basic reporting 
This is an interesting manuscript, but a frustrating aspect is that the experimental pKa values used for comparison are not included for most compounds. These could easily be added to Table 1. In fact, the best solution would be to modify Table 2 to give the calculated pKa values from the various methods along with the experimental values.

Authors: The values are already provided in Supplementary Materials

The copy I received contained no reference at all to “Supplementary Materials”. If these materials are available directions for accessing them should be clearly presented in the normal position just before the References.

(2) Also, the authors should refer somewhere to the very relevant PM6 pKa calculations by Jimmy Stewart given in http://openmopac.net/pKa_table.html. 

Authors: We already refer to this approach in the introduction (Stewart 2008).

The reference Stewart (2008) concerns proteins and has nothing at all to do with pKa estimates. As clearly indicated, the relevant Stewart study is not a formal publication, but has been made widely available to workers by Stewart on the web page as indicated. Apparently the authors didn’t even bother to look at it.

(3) Validity of the findings 
It would be very helpful if the authors would provide a figure comparing the calculated and experimental values, and include in the text the relevant equation with proper statistics (n, r2, s, F) along with the uncertainties for the slope & intercept. (See, e.g., the book by Shields & Seybold on this topic, or their WIRES article.) 

Authors: The statistical analysis the reviewer refers to is done in the context of a QSAR prediction of pKa from QM data, i.e. to gauge the accuracy a linear fit to be used in the prediction of unknown pKa values. The statistics used in this paper is just aimed at gauging the accuracy of the predicted values and, in our opinion, is more than adequate for the task. If the reviewer can explain how the requested statistics is to be used in the context of the current paper we will be happy to reconsider the request.

This is a standard way to compare not just QSAR results, but any studies in this field. It would be helpful, and I don’t understand the authors’ reluctance to include it.

Annotated manuscript
The reviewer has also provided an annotated manuscript as part of their review:

Reviewer 2 (Anonymous)

Comments for the Author
I thank the authors for the revised manuscript. I would still like the authors to address my second point as to what is the major source of error in these calculations, especially for the outliers. Is it the gas phase energies, or the solvation component?

Thursday, June 16, 2016

Reviews for Prediction of pKa values using the PM6 semiempirical method

2016.06.21 Update: Here's our response

Reviews of our latest PeerJ submission is in after only 15 days.  This must be some kind of record!

Personal comments from the editor:

Table 4 shows dramatic differences between PM6-D3H+ and PM6 although the previous tables did not show very large differences between both semiempirical methods. Please discuss this.

How do the errors in PM6 or PM6-D3H+ gas-phase protonation energies (vs. experiment or high level computation) change when moving from primary to secondary and tertiary amines? I believe that the addition of a table with these data (with each tested amine treated separately) would be very helpful for the readers and future practitioners.


Reviewer 1 (Anonymous)

Basic reporting

This is an interesting manuscript, but a frustrating aspect is that the experimental pKa values used for comparison are not included for most compounds. These could easily be added to Table 1. In fact, the best solution would be to modify Table 2 to give the calculated pKa values from the various methods along with the experimental values. Also, the authors should refer somewhere to the very relevant PM6 pKa calculations by Jimmy Stewart given in http://openmopac.net/pKa_table.html.

Experimental design

In general this work is properly designed.

Validity of the findings

It would be very helpful if the authors would provide a figure comparing the calculated and experimental values, and include in the text the relevant equation with proper statistics (n, r2, s, F) along with the uncertainties for the slope & intercept. (See, e.g., the book by Shields & Seybold on this topic, or their WIRES article.)

Comments for the Author

After improvements, this manuscript will be of interest to many people attempting to calculate pKas, especially those dealing with high throughput applications.


Reviewer 2 (Anonymous)

Basic reporting

The paper is well-written and organised in a manner that was easy to read. I did find the background / literature research on the short side. Specifically, the isodesmic or proton exchange scheme was developed quite some time ago by various groups . See for example: (a) http://dx.doi.org/10.1063/1.1337862 (b) 10.1021/ct800335v and (c) 10.1021/jp107890p. These studies have laid out quite clearly the effectiveness of an isodesmic scheme for error cancellation, as well as its limitations (e.g. the need for a structurally similar reference with accurately known pKas). Another minor point is there should be a footnote to explain what "**" in Table 2 means.

Experimental design

The research question is well-defined, namely whether contemporary semi-empirical methods can provide cost-effective predictions of pKas. I do have a number of suggestions for improvement:

(1) Computational methods: It was not clear how the solvation free energies were computed - e.g. were these done on gas phase or solution phase optimised geometries? Strictly speaking, the gas and solution phase components of the solvation free energy should be computed on geometries optimised in the respective phases. How sensitive are the results to this choice?

(2) There is a lot of data condensed into the Tables which could actually be used to provide even deeper insights. For example, I would love to see a breakdown of the solution phase energies into the gas phase and solvation contributions as laid out in eqn (6). This would be useful for identifying the sources of errors especially for the outliers.

(3) The dataset molecules in Table 1 are structurally very similar (the substituents are mostly aliphatic groups). It would be interesting to see a more diverse selection of molecules (e.g. EWG and EDG) as the authors alluded to in their conclusion.

Validity of the findings

I think the conclusions are fair based on the results presented. However, I do recommend the authors consider my earlier suggestions to provide clearer insights as to why semi-empirical methods can sometimes fail badly even for isodesmic reactions. This will spur further research into improving these methods.


Reviewer 3 (Anonymous)

Basic reporting

- Line 132: change "can play and important role" to "can play an important role"

- Citations need to conform to the journal style thoughout: see, e.g., "taken from (Morgenthaler et al., 2007)" in Table 4 caption should be changed to "taken from Morgenthaler et al. (2007)"

- References in the bibliography need to be consistently formatted to journal guidelines

- Other groups have reported validation efforts for predicting pKa values using the PM6 method (see, e.g., Rayne et al. [2009], Juranić [2014], etc.). The authors should cite and incorporate the findings of all these prior PM6 pKa validation efforts into the current study to demonstrate an understanding of the prior literature in this field. Otherwise, it looks as though the authors are attempting to make their research appear more novel than it actually is.

Experimental design

- Experimental design is appropriate.

Validity of the findings

- The findings appear valid.

Wednesday, April 13, 2016

Reviewing for PeerJ: it's the little (and the not so little) things

I just did my first review for PeerJ and it was a real pleasure because there are a lot of "little things" that make your reviewing life easier:

1. Figures/tables are in the text and, get this, the captions are immediately above/below the corresponding figure/table.  Some other journals also do this, but not enough.

2. I annotate the mss in a pdf reader and usually this is a frustrating experience since the publisher generated pdf has all sorts of "quirks" that make highlighting and copying text hit and miss.  The previous pdf I reviewed turned every page with a figure into an image!  Annotating/copying in the PeerJ pdf worked flawlessly.

3. The pdf contained a 3 front pages with the due date, a summary of the review criteria, a link to the page with the supplementary material, and a link to the page where I should submit my review.  No hunting around for the email with the link! I teared up a little bit when I saw that.

Other "little things" include stuff like not having to rank the perceived importance or impact of the work on some bogus 1-10 scale, a strict policy on making the raw data available, and a button to click to make my review non-anonymous.




This work is licensed under a Creative Commons Attribution 4.0

Thursday, March 31, 2016

Reviews for Towards a barrier height benchmark set for biologically relevant systems

2016.04.06: our rebuttal can be found here

The reviews our latest PeerJ submission (editor assigned February 25) came back last evening with the alway welcome verdict "minor revision".  The preprint has already received over 400 views and 85 downloads.

Reviewer Comments

Reviewer 1 (Xabier Lopez)

Basic reporting
No Comments

Experimental design
No Comments

Validity of the findings
No COmments

Comments for the author
The present papers is a first step towards the generation of a reaction database for biological systems. To do so the authors provide with highly accurate DFT and ab-initio structural and thermodynamic data for a set of five enzyme reactions. The results are compared with various PM# type semi-empirical methods and DFTB3. The work sets up an standard in terms of semi-empirical method validation, that hopefully, it will cristalize in a joint effort with the rest of the theoretical chemist community to create a database of biologically relevant reactions. In this sense, I find this paper with a high potential and should be publised.

Although the set of reactions is small as to draw general conclusions, I understand that the goal of the authors is to set up an standard so that other theoretical chemistry groups worldwide will contribute to this joint effort. I think that this is a very good and necessary idea. I would suggest that it would be useful in the github.com/jensengroup/db-enzymes dataset that the authors would consider to provide with standard inputs for the programs that should be considered as standard for other theoreticians as well. In this way, the desirable growth of this dataset by contributions from other groups will not jeopardize its necessary coherency.

Reviewer 2 (Anonymous)
Basic reporting
The authors present a compilation of model systems for five, real-life enzymatic reactions based on previously published structures, reaction barrier heights and reaction energies that had been obtained with the B3LYP density functional theory (DFT) approximation. These DFT structures and numbers are used as a reference to examine various cost-efficient semi-empirical methods. The authors consider both geometry optimisations, as well as single-point energy calculations. Furthermore, they also comment on the effect of a solvation model on the results.

The main motivation of this study is clearly outlined and it is important to note that the authors point out that their study is only the first step in a long-term undertaking that aims at having a diverse set with more accurate reference energies. The language is clear and professional. Raw data is supplied through a link to figshare and can be easily accessed and examined by the readers of this paper. With regards to cited literature and the figures, I have the following suggestions to make:

a) I think the cited literature should be more comprehensive and also take into consideration recent QM studies on peptides and proteins that allow a better understanding of the quality of the herein used DFT reference values; I will elaborate on this point more in detail under “validity of the findings”. I also noticed that some cited articles lack a volume and page number; I therefore recommend that the authors carefully check their list of references for any mistakes.

b) While the general quality of the figures is very good, I find Figures 3 - 6 hard to read. In these figure, the authors attempt to compare geometries for two levels of theory with each other. While one does note differences, the structures lack a common reference point that would allow the reader to better gauge whether these differences are due to structural changes or due to slightly different viewing angles. Would it be possible to improve these figures and to align the geometries with respect to one of the involved molecules?

Experimental design
I identified some issues that would make it hard for others to exactly reproduce the results:

a) page 2, line 78: The authors say that during the geometry optimisations, the position of some atoms were restrained. It is not clear which atoms these are and why that choice was made.

b) In section 3.3 (and in Tables 2 and 3) the authors refer to different models for each of the five tested reactions that all vary in system size. This is the first time in the manuscript that the authors refer to this and it is hard to follow how exactly these models differ from each other and what they actually look like. It therefore makes it hard to follow this section and I recommend a revision.

Validity of the findings
As mentioned above, I do understand what the authors are trying to achieve with their study. It is supposed to only serve as a starting point for the future development of an accurate benchmark set. Also, it is likely that the authors’ current computational capabilities are limited and that they therefore use previously published structures and energies. Unfortunately, the chosen level of theories for the geometry optimisations [B3LYP/6-31G(d,p)] and the singlepoint energy calculations [B3LYP/6-311+G(2d,2p)] are questionable for the studied systems and properties. I will first elaborate on this statement, starting with the quality of the geometries, before I will conclude with a recommendation on how the authors could address this problem without substantially delaying publication.

The structural stability of biomolecular systems - and of any sizeable molecule or molecular aggregate – is heavily influenced by important London-dispersion (van-der-Waals) effects, and it is a well-known fact among quantum chemists that B3LYP does not cover these effects. Moreover, small basis sets, such as 6-31G(d,p) induce the so-called basis-set superposition error (BSSE), which is an artificial overstabilisation of noncovalent interaction energies, including London dispersion and hydrogen bonds. BSSE exists both for inter- and intramolecular interactions and it has been demonstrated in the literature that it distorts molecular structures, see e.g. early works by van Mourik and co-workers on peptide conformers:
-van Mourik, T.; Karamertzanis, P. G.; Price, S. L., J. Phys. Chem. A 2006, 110, 8.
- Holroyd, L. F.; van Mourik, T., Chem. Phys. Lett. 2007, 442, 42.

Later, Goerigk and Reimers showed how error compensation between the lack of a proper dispersion treatment and BSSE can artificially generate structures of seemingly acceptable quality. However, it was also demonstrated that this error compensation cannot be controlled and that in fact using dispersion and BSSE corrections led to more accurate and reliable structures. This was first discussed for gas-phase structures of peptides, and later for the crystal structure of a protein fragment:
- Goerigk, L.; Reimers, J. R., J. Chem. Theory Comput. 2013, 9, 3240.
- Goerigk, L.; Collyer, C. A..; Reimers, J. R., J. Phys. Chem. B 2014, 118, 14612.

Efficient corrections for both London dispersion and BSSE exist that do not compromise the computational efficiency of B3LYP/6-31G(d,p) and they should also be mentioned (e.g. Grimme’s DFT-D3(BJ) and gCP corrections). This is particularly important, as this journal is directed at a readership that may not be familiar with these methodological developments and the dispersion and BSSE problems.

While BSSE is negligible for the 6-311+G(2d,2p) basis set, the singlepoint energies still suffer from the London-dispersion problem. The authors already mention the large GMTKN30 benchmark set in their manuscript. An additional study on GMTKN30 with nearly 50 methods has conclusively shown how dispersion can influence reaction energies and barrier heights by several kcal/mol (Goerigk, L.; Grimme, S.; Phys. Chem. Chem. Phys. 2011, 13, 6670.). A proper treatment of dispersion is therefore crucial. That same study has also shown that B3LYP is worse than the average of more than 20 hybrid density functionals for reaction energies. Consequently, using B3LYP/6-311+G(2d,2p) as a benchmark for the study of other methods may therefore distort the findings, and statistical values such as mean absolute deviations have to be interpreted with caution. In fact, it may well be that some of the tested semi-empirical methods perform much better than what the reported MADs may suggest. In this context, note that a recent study by Karton and Goerigk has demonstrated how the quality of the benchmark reference heavily influences the interpretation of low-level methods used to calculate barrier heights of pericyclic reactions (J. Comp. Chem. 2015, 36, 622). If one analyses the results of this study carefully, one can also see the importance of dispersion; for instance, it lowers the B3LYP barrier height of the Diels-Alder reaction between 1,3-butadiene and ethene by nearly 6 kcal/mol!

After this explanation, please let me emphasise that I am not against the way this study has been carried out. The studied reactions are highly interesting and the actual reaction barriers and energies provide useful insights for quantum chemists. Recalculating the reported numbers may therefore not be necessary at this stage and it can be postponed to a future study. Nevertheless, I am of the opinion that this manuscript can benefit from a discussion of the above points, particularly in the outlook section. Dispersion and BSSE should not be neglected and this manuscript is an ideal platform to convey this message to a non-expert readership that may not be aware of how the field of DFT approximations has changed over the past years. Such a discussion should also mention that also the tested semi-empirical methods lack from a proper description of noncovalent interaction energies and that others have combined them with DFT-D2 or DFT-D3-type corrections (DFTB3-D3, PM6-D3, etc. ) and with additional hydrogen-bond corrections for PM6 (see e.g. the works by Hobza and Korth). One or two paragraphs discussing the importance of these points are therefore sufficient.

Comments for the author
page 3, line 104: Is the quotation mark at the beginning a mistake or does it introduce a literal quote from another paper? If the latter is the case, a second quotation mark at the end of the quote is missing.

Reviewer 3 (Anonymous)
Basic reporting
I think this paper represents a nice effort of building a set for benchmarking barrier height in modelling biochemical systems. The authors provide themselves the benchmark for several semiempirical methods, using B3LYP as a reference point. Nevertheless, as the authors acknowledge, they provide a limited number of systems studied and a reference state (B3LYP) which could not be considered the state of the art. Thus, in my opinion the manuscript should be considered more as proposing the benchmark idea than a rigorous initial step towards it. In this regards I miss some section underlying how to expand the set, with some practical example. It could be even more interesting some efforts in making that part easier (semi-automatic) from standard QM or QM/MM calculations (although this part could be expanded in the future).

Experimental design
Since most reports are based on the comparison with B3LYP data (it seems to me that from other sources) I wander if the authors have reevaluated some of these numbers. Being single point evaluations it should not require excessive computational time.

Validity of the findings
To help the inspection of the results (and differences between B3LYP and semiempirical methods), I would like to see the distances (at least the most changing ones) being added in Figures 2-5.

Comments for the author
minor comments:

.) (line 47): "...TS structures are known to dependent significantly"
.) (line 104) " ...kcal/mol. “The larger..." (a non closed ")
.) (line 116) "PM7 (to 4.0 and" (non closed parenthesis)

Wednesday, January 13, 2016

Writing an impact neutral review


The idea of impact neutral reviewing was pioneered by PLoS ONE ten years ago this year.
The idea is that ... PLOS ONE only verifies whether experiments and data analysis were conducted rigorously, and leaves it to the scientific community to ascertain importance, post publication, through debate and comment. [source]
I have now been writing impact neutral reviews for almost three years and sum up my experience so far here.

1. I review all papers with the same, impact neutral, criteria regardless of journal. I don't rank the paper in any way and if there is a required field that I don't like I just enter a "-".

2. I have not recommended rejection of a single paper in that time.*  I have seen several cases where the conclusions were not supported by the data, but in these cases I recommend changing the conclusions instead.  Sometimes the conclusion was, in my opinion, that nothing could be concluded (without further calculations) or that the approach didn't work. But negative results are fine within impact neutrality as long as it is clearly stated as such.

3. All my reviews now start with "In my opinion the following issues should be addressed before the paper is suitable for publication" (see also next point).  I haven't come across any papers were didn't feel something (sometimes minor) needed to be fixed. "Impact neutral" also means that I don't praise the paper even if I think it is important.

4. In spite of point 2 my reviews for "high impact" journals such as JACS still turn out more "critical" because the conclusions more often need to be "toned down" based on the data in my opinion.

5. As part of sticking-to-the-facts I quote every sentence I have a problem with. I don't question the motives of the authors in writing what they did or express any annoyance I may feel. I phrase a lot of critique as questions and ask authors to clarify.

6. I end all my reviews with "Jan Jensen (I choose to review this paper non-anonymously)". Non-anonymity (onymity) is not really part of being "impact neutral" but I can't see any reasons not to sign my reviews in light of points 1-5.

7. For the same reason I also share all my reviews on Publons, although not all journals allow Publons to make them visible. In fact my greatest motivator (imagined fear) when reviewing is some future reader spotting some obvious factual error in the paper that I missed and then finding my review online.

---
Footnote to point 2 *The closest I got to a rejection was a paper that used some math that was completely foreign to me and very poorly described. I actually went as far as Googling the authors to see whether they were really affiliated with a university as they claimed.  Ultimately I wrote to the editor saying that I could not judge whether this work was legit or not.



This work is licensed under a Creative Commons Attribution 4.0

Saturday, September 19, 2015

ProCS15 paper: reviews are in

2015.10.2 update: Our rebuttal can be found here.  The paper is now accepted.

The reviews of the ProCS15 paper we submitted on August 25 arrived last evening. 25 days to first decision. The verdict was "minor revisions". The editor was Freddie Salsbury, Jr (who also handled our very first PeerJ paper) and both reviewers chose sign their reviews.  Another very pleasant publishing experience with PeerJ.

Editor's comments
Both reviewers have some minor corrects to make and the second reviewer raises a point of skepticism about QM-based vs empirical estimators. A discussion addressing this would likely be of benefit to the field.

Reviewer Comments
Reviewer 1 (Xiao He)
Basic reporting
No Comments
Experimental design
No comments
Validity of the findings
No comments
Comments for the author
This manuscript is of great importance and I totally support its publication in PeerJ. The authors present an excellent and accurate chemical shift prediction program (ProCS15) based on millions of DFT calculations on simplified models. ProCS15 has extended the capability of previous ProCS program, which predicts the backbone amide proton chemical shift, to fast estimation of chemical shifts of backbone and C beta atoms in large proteins. The accuracies of chemical shifts on two proteins (namely, Ubiquitin and GB3) predicted by ProCS15 are very close to the results from fragment-based DFT calculations by Zhu et al., and Exner and co-workers. Nevertheless, the computational cost of ProCS15 is within a second. This program will be widely used in the NMR community. I only have a few minor points.

1) In the Introduction section, “RMSD observed for QM-based chemical shift predictions may, at least in part, be due to relatively small errors in the protein structures used for the predictions, and not a deficiency in the underlying method.” I agree with the first half of the statement, however, the limitation of current density functionals also contributes to the discrepancy between experiment and DFT calculations, especially for the 15N chemical shift prediction.

2) The first AF-QM/MM work is highly recommended to be cited in the paper,
He X., Wang B. and Merz K.M., Protein NMR Chemical Shift Calculations Based on the Automated Fragmentation QM/MM Approach. J. Phys. Chem. B 113, 10380 (2009)
Reviewer 2 (Dawei Li)
Basic reporting
No comments.
Experimental design
No comments
Validity of the findings
No comments
Comments for the author
This work is a direct extension of the author’s previous work on quantum based protein chemical shift calculation. The performance is comparable to other quantum based predictors but is worse than current empirical predictors. Because of this, I am still skeptical about all quantum-based predictors. Without solid cross-validation, it is very hard to argue that quantum predictors can capture subtle effect better than empirical predictors. It is true they respond more sensitively to minor structural change, but not necessary in a correct way. On the other hand, it is very useful for the whole community to have more selections that is different from previous ones. (Note that predictions from most empirical predictors are highly correlated, i.e., it won’t provide more information by switching from one to another empirical predictor.) In this context, this work should be published.

It is nice that the prediction performance can be improved a lot if applied to more realistic NMR-derived ensembles. This is expected because the experimental chemical shift of a given nucleus reflects the Boltzmann-weighted average of the 'instantaneous' chemical shifts of a large number of conformational substates that interconvert on the millisecond timescale or faster. This behavior has been discussed many times in the literature. All Ubiquitin NMR structures cited in this work are generated specifically to be a more realistic presentation of protein ensemble in solutions, except 1D3Z. 1D3Z is a traditional NMR structure model, where NMR conformer “bundle” should not be confused with a dynamic ensemble representation of the protein. In these types of NMR models, the spread of atomic positions merely provides information about the uncertainties of the atomic positions with respect to the average structure and has no direct physical meaning. The author may need to provide more comments on this in their last section titled “Comparison to experimental chemical shifts using NMR-derived ensembles”.

Thursday, March 26, 2015

Second reviews of the PCCP paper - almost there

PCCP writes: "I am pleased to inform you that your revised manuscript has been recommended for publication in Physical Chemistry Chemical Physics subject to further revision in line with the attached reports."

Referee: 1
Comments to the Author
Most of my comments have been addressed.

However, I am still concerned about the recommendations for treating low frequency vibrations.  The harmonic oscillator approximation is not appropriate for low frequencies because the entropy term goes to infinity.  As recommended by Grimme (2012), a hindered / free rotor may be a better treatment (for related methods and discussion see TRUHLAR, DG, J COMP CHEM 1991, 12, 266-270 DOI: 10.1002/jcc.540120217 and McClurg, RB, Flagan, RC, Goddard, WA JCP 1997, 106, 6675-6680 DOI: 10.1063/1.473664)

My comment about explicit water molecules also concerned entropy.  While including explicit water molecules may increase the CPU time, this can be overcome.  What extra CPU time cannot overcome is the fact that the explicit waters in reality can exchange with the bulk water increasing their entropy compared to a RRHO calculation of their entropy.  It is best to include explicitly only tightly bound

Immediate reactions
Point 1:
* The paper lists options for how to treat low frequency vibrations but there is not a specific recommendation because the issue as it applies to binding free energies has not been thoroughly compared.
* Grimme's approach is not derived from first principles and is therefore not a priori better than other approaches.  It will in some cases treat modes that are clearly stretch vibration as a free rotor
* The free rotor entropy also goes to infinity as the frequency approaches zero.

Point 2:
* Any such effect is included implicitly in the parameterization of the solvation free energy of H2O.
* Any error in this parameterization is largely cancelled by using the water cluster approach recommended in the paper.
* Bryantsev et al. have shown that using the water cluster approach leads to smooth convergence of solvation free energies of H+ and Cu2+ that are in good agreement with experiment.



This work is licensed under a Creative Commons Attribution 4.0

Wednesday, March 4, 2015

Reviews of PCCP paper

The reviews of my PCCP paper arrived on Feb 25th and I forgot to post them. I have started working on the rebuttal.  Comments welcome.

Reviewer(s)' Comments to Author:
Referee: 1

Comments to the Author
This perspective article discusses the many factors that can make small but important contributions to absolute host-guest binding energies computed with electronic structure methods.  This is a valuable contribution to the literature in that it gathers in one place various aspects of solvation and thermodynamics for binding energies.  There are a number of items that I would like to see addressed before this paper is accepted.
(a) State whether ZPE is included in E_gas or in G_gas,RRHO.  I presume in the former, the latter accounting for only the thermal corrections using RRHO. (however, when I see RRHO, I automatically think it includes ZPE)
(b) Recently, some have advocated calculating the gas-phase thermochemistry at an elevated pressure to simulate the decreased translational freedom encountered in solution. Does this affect the thermal and entropy corrections beyond a simple change in volume?  Is this something that should be encouraged?
(c) For delta G_solv(H+), I find it very risky to compute this directly using explicit solvent molecules.  Better to put the H+ on another molecule of known pKa and use continuum solvation to compute the energy difference.
(d) Other ions:  If the ion concentrations are high (experimentally), is it necessary to consider the effect of ionic strength on the activity in calculating the binding energies? Is this a consideration in the computational simulations as well?
(e) Including more than a few explicit waters in a binding energy calculation can also mess up the entropy term, since the positions of these extra water molecules are not sampled adequately.  However, this should not be a problem if only a few tightly bound waters are included.  It would be good to add a comment.
(f) Minor matters:
Pg 3: The sentence before “Molecular Thermodynamics” seems out of place.  Should it be part of the previous paragraph?
Pg 4: “volume of and ideal gas”
Pg 6: “double-differencing” central difference of double numerical difference?
Pg 6: “better to pretend that the imaginary frequency is real” – a very bad idea if the frequency is small, since the entropy blows up. Maybe better to “pretend” that it is a free rotor, which has a well defined entropy.
Pg 9: “van der Waals interactions with the solvent” -> “van der Waals and dispersion interactions with the solvent” (just so there is no misunderstanding)
Pg 10 - Eq 21 and the sentence after it: V_solv or Delta V_solv? (as in Eq 22 and 23)
Pg 11: “numerical instability”? (is this more a matter of numerical noise due to the discretization of the surface elements of the cavity leading to discontinuities in the PES that are problematic for the optimizer – a number of codes have overcome this problem)
Pg 12: For an interesting paper on thermodynamic cycles and solution phase optimization, see DOI: 10.1039/c4cp04538f)
Pg 15: “if protonations states”
Pg 15 – 19, abstract:  The first person is normally not used in scientific writing


Referee: 2

Comments to the Author
This is a perspective about using QM methods to estimate ligand-binding free energies, using approaches originating from QM-cluster studies of enzyme reactions. The perspective is concentrated on the treatment of multiple conformations and pKa effects, although other effects are also mentioned. It is somewhat surprising that the author has not published a single paper on the subject of the perspective; consequently numerical results are very few and discussion is much concentrated on a few publications of the Grimme group. Still, the subject is of general interest. However, the scope needs to be better defined and all methods and formulae need to much better defined before the paper can be accepted.
1. In general, the author should go through all equations and ensure that all terms are defined.
2. The scope of the perspective must be better defined. QM methods have been used for over 10 years for ligand binding to proteins, typically using MM/PBSA-like approaches (cf. publications and reviews by Merz, Hobza and Ryde, for example). Likewise, the author ignores attempts to using QM post-processing FEP calculations.
3. The introduction should start with a more general discussion of available methods to calculate ligand-binding energies and why QM is needed.
4. Different types of QM methods should be described and it should be explained why the author concentrate on DFT and SQM methods.
5. What is TPSS27 (p.3)
6. HF-3c should be explained
7. “by fitting against ∆∆H_f,gas to ∆E_gas values” does not make sense to me.
8. Regarding the low-frequency vibrations, Grimme uses a scaling function so that there are smooth transition between vibrations and free rotation (making the actual value of the frequency unimportant below ~100 cm-1). Truhlar et al. have used a similar approach (but not for ligand binding).
9. The prime problem with conformations is not to use Eqn. 7 but to find all low-energy conformations, including the global minimum.
10. What is the accuracy of computationally estimated pKa values (i.e. what does “fairly accurately” mean quantitatively)? Is it enough for ligand binding?
11. The meaning of dG_solv(H+) should be explained and in general the difference between upper- and lower-case delta should be clarified.
12. What do the over-bar X and L in Eqns 14 and 16, etc. signify?
13. It should be “van der Waals”.
14. I think the selection of the reference state is primarily determined by what experimental results you want to reproduce.
15. A short description of available CM approaches would be appropriate, referring to Table 1.
I suppose you need to specify the variant of PCM also (IEF or C or what?).
16. COSMO-RS is parametrized for many more levels of theory than BP/TZVP.
17. References to the accuracy of solvation energies should be given. In the SAMPL competitions, appreciably worse results are typically seen.
18. I think problem with converging solution-phase optimizations is a problem special to the implementation in Gaussian. With COSMO in Turbomole, no such problems are ever seen.
19. A recent update to Ho et al. 2010 is PCCP 2015, 17, 2859.
20. Since the author only considers water solvation, he should consider changing “solvation” to “hydration”.
21. What is meant by “(dispersion and free energy contributions to the binding free energy” on p.19.




This work is licensed under a Creative Commons Attribution 4.0

Saturday, May 17, 2014

Pre- and post-publication peer review: some new tools

There has been some interesting developments in peer review lately:

Altmetric it is a bookmarklet you install in your browser.  Once installed go to any recent publication on the journal web page and click on the bookmarklet to get altmetric data (once you have installed it try this computational chemistry paper as an example).  Most importantly you get links to twitter comments and blog posts (I hear Google+ and PubPeer integration is on the way) where a lot of post-publication peer review happens.

I think this is the future of post-publication peer review: don't worry about where it happens, just make sure it can be easily found.

PubPeer is discussion forum centered around journal articles and preprints (click here for a computational chemistry example). These discussion fora are notoriously difficult to get off the ground so I am impressed how much discussion is already going on there.  PubPeer has recently made an extension that adds links to PubPeer comments on Pubmed search results and journal websites.

PeerLibrary allows you to create and share annotated versions of papers.  If you think about it, the most common form of post-publication peer review is the highlighting of text and margin notes we all make when we read a paper.  PeerLibrary allows you to share this.

I do most of my annotations using iAnnotate on my iPad and uploading pre-annotated pdf files does not seem to work for all pages, so I am not sure how much I will use this service yet.  Also, the text in the uploaded files are currently quite grainy. But I like the general idea very much and I'll probably use it more if/when these things get fixed. Anyway, you can see what I have messed around with so far here.

Publons is a site for listing, and getting credit for, the reviews you do.  This appeals greatly to me since usually review anonymously. I have just has a brief look at it (you can see my profile here) but I think my workflow will be the following:

(1) Prepare (e.g. paste in) the review and de-select "Has this article been published?".  This will just list the article title and journal.  As the site states "We can not publish the content of your review until the article has been published, as this would be unfair to the author if they decided to resubmit the article."

(2) Once the article is published, you can make the actual review visible.  It would be nice if Publons could keep track of this (using the info you provided) and alert you.

You can also review papers already published (in which case it's a bit like PubPeer) and you can also review anonymously.


Have I missed some cool reviewing tools?  If so, leave a comment.


This work is licensed under a Creative Commons Attribution 4.0

Tuesday, March 18, 2014

ROC curves and picking cutoffs

We just got the 2nd rounds of reviews for +Luca De Vico's latest PLoS ONE paper.  In the paper we try to predict HIV protease mutants that will cleave a particular peptide sequence and we use peptide-protein interaction energies as a measure of cleavability.  How well does this work?  The reviewer suggested ROC curves to quantify this.  Here's how it works.

We have 11 naturally occurring peptides that we know are cleavable (there are also some non-natural peptides that I'll ignore in this post) and 42 that we know are non-cleavable. Here are computed interaction energies (in kcal/mol) for all cleavable peptides and non-cleavable peptides which interaction energies < -40 kcal/mol .

Cleaveable (11)Non cleaveable (42)
-72
-68-68
-68-63
-64-54
-63-49
-62-45
-62-45
-57-44
-52-42
-47
-41

If we say that peptides with interaction energies < -40 kcal/mol are cleavable then we will have correctly predicted that all 11 cleavable peptides are cleavable, but also that 8 non-cleavable peptides will be cleavable.  Put another way, our "true positive" rate is 100% (11/11) and our "false positive" rate is 19% (8/42).

If we pick -45 kcal/mol as the cutoff the numbers are 91% and 10%: we have fewer false positives but we miss some true positives. The plot of true vs false positives is an ROC curve:


In a perfect world our true positive rate would be 100% and our false positive rate would be 0, so we are looking for the point closest to 0, 1, which happens to be -45 kcal/mol.

We can also quantify how good this approach is in general by finding the area under the curve, which will range from 1 (perfect) to 0.5 (useless) and, for example, compare two different methods for calculating the interaction energies


This work is licensed under a Creative Commons Attribution 4.0

Friday, December 13, 2013

Review of Hybrid RHF/MP2 geometry optimizations with the effective fragment molecular orbital method

The reviews of +Anders Steen Christensen and +Casper Steinmann PLoS ONE paper are in. Some preliminary thoughts:

Question 1. I think the main problem is that we left out a lot of details because they have been discussed extensively in this paper. So we need to refer to this paper more extensively.

Question 5. Reviewer #2: 
Point 1. we should clarify
Point 2. don't understand, in what way unclear
Point 3. we should make such a figure.  We shouldn't show individual fragments, but rather which parts are treated with MP2 and which parts are frozen.

------
From: PLOS ONE <plosone@plos.org>
Date: Wed, Dec 11, 2013 at 8:17 PM
Subject: PLOS ONE Decision: Revise [PONE-D-13-43802] - [EMID:f0cd9b87a193051a]
To: "Anders S. Christensen" <xxx>


PONE-D-13-43802
Hybrid RHF/MP2 geometry optimizations with the effective fragment molecular orbital method
PLOS ONE

Dear Mr. S. Christensen,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit, but is not suitable for publication as it currently stands. Therefore, my decision is "Major Revision." 

We invite you to submit a revised version of the manuscript that addresses the points below: 

while this manuscript presents a likely technical advance in QM/MM that could be significant, there is a lack of clarity and context in the manuscript. Each reviewer has noted different aspects that suggest a difficulty in understanding to what extent this method improves upon existing methods, and to what extent this method can be applied across multiple systems.
I encourage you to address each point made by the reviewers. The points relating to comparing this method to others and to explaining discrepancy are particularly important. This manuscript would also benefit from a reorganization and a more critical comparison to other methods.

We encourage you to submit your revision within forty-five days of the date of this decision. I recognize this might not be possible given the recommendations, so I encourage you to ask for an extension if necessary.

When your files are ready, please submit your revision by logging on to http://pone.edmgr.com/ and following the Submissions Needing Revision link. Do not submit a revised manuscript as a new submission. Before uploading, you should proofread your manuscript very closely for mistakes and grammatical errors. Should your manuscript be accepted for publication, you may not have another chance to make corrections as we do not offer pre-publication proofs.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. 

Please also include a rebuttal letter that responds to each point brought up by the academic editor and reviewer(s). This letter should be uploaded as a Response to Reviewers file.

In addition, please provide a marked-up copy of the changes made from the previous article file as a Manuscript with Tracked Changes file. This can be done using 'track changes' in programs such as MS Word and/or highlighting any changes in the new document. 

If you choose not to submit a revision, please notify us. 

Yours sincerely, 

xxx
Academic Editor
PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:



Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes



Please explain (optional).

Reviewer #1: This is relevant paper on using MP2 with effective fragment molecular orbital method and demonstrated to be alternative to ONIOM. The paper would be potentially valuable but I would suggest more discussion about the potential of the method and its outputs to be done. Chorismate mutase is the "hydrogen atom" for QM/MM modelling so there is vast majority of data from many groups, therefore there is a potential in this paper for more comprehensive discussion.

Reviewer #2: (No Response)

Reviewer #3: This study compared the EFMO method with ONIOM method as for the reaction free energy barrier for the Chorismate Mutase. In general, the results are more consistent than that of the ONIOM. This review agrees that the current manuscript is publishable, and expect the authors to explain the possible reasons for: (1) the calculated free energy barrier is much higher than that of the experimentally measured enthalpy change? (2) The authors claimed that the MP2-geometry optimization make it 3.5 kcal/mol lower for the free energy barrier than that of the ONIOM method, however, the listed data of free energy barrier in Table2 is close to each other at the same calculation level. (3) The portability to other enzyme system of EFMO method?



2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: I don't know

Reviewer #3: Yes



Please explain (optional).

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: The data of all tables and figures are clean and good.



3. Does the manuscript adhere to standards in this field for data availability?

Authors must follow field-specific standards for data deposition in publicly available resources and should include accession numbers in the manuscript when relevant. The manuscript should explain what steps have been taken to make data available, particularly in cases where the data cannot be publicly deposited.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes



Please explain (optional).

Reviewer #1: (No Response)

Reviewer #2: (No Response)

Reviewer #3: This is a typical study on the topic of QM/MM method and application for enzyme reaction.



4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors below.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes



Please explain (optional).

Reviewer #1: (No Response)

Reviewer #2: The reference should be gived as [1-13]in the text,but not [1,2,3,4,5,6,7,8,9,10,11,12,13].

Reviewer #3: Yes, the whole manuscript is organized very well, and written well.



5. Additional Comments to the Author (optional)

Please offer any additional comments here, including concerns about dual publication or research or publication ethics.

Reviewer #1: (No Response)

Reviewer #2: The authors implemented the correlated method in the EFMO/FDD approximation on the optimizing a complex of chorismate mutase and chorismate. The authors have presented the transition state structure, reaction barrier, and reaction energy, etc. While the method has more improved the results than the previous work, the paper as present is organized unclearly. There is hardly any insight that can be gained from this word. The manuscript is unsuitable for publication in current version.
To name a few questions.
1. In the theory part, the given molecular system is described, which is defined into tow domains F and A. But in the following description, the b domain (buffer domain) is contained. The system is divided into three domains or two domains? 
2. The Table 1 and 2 is disordered.
3. The complex which divided into different domains should show in a figure, which describes the structure and thedevision of different domains in the complex of chorismate mutase and chorismate. It makes the computed model direct and clear.

Reviewer #3: no additional comments at this time.



6. If you would like your identity to be revealed to the authors, please include your name here (optional).

Your name and review will not be published with the manuscript.

Reviewer #1: (No Response)

Reviewer #2: (No Response)


Reviewer #3: (No Response)