Proteins and Wave Functions: September 2018

Tuesday, September 11, 2018

Reviews of Solvation Energy Predictions Using The SMD Solvation Method and Semiempirical Electronic Structure Methods

Really late posting this. The paper is already out at JCP. Here are the reviews for the record.

Reviewer #1 Evaluations:
Recommendation: Revision
New Potential Energy Surface: No

Reviewer #1 (Comments to the Author):

In this contribution, the authors report a set of systematic analyses of semi-empirical (NDDO and DFTB) methods combined with continuum solvation models (COSMO and SMD) for the description of solvation free energies of well-documented benchmark cases (the MNSOL dataset). They found that the performance of NDDO and DFTB continuum solvation models can be substantially improved when the atomic radii are optimized, and that the results are most sensitive to the radii of HCNO. Another interesting observation is that the optimized radii have a considerable degree of transferability to other solvents.

Since an efficient computation of solvation free energies and related quantities (e.g., pKa values) is valuable in many chemical and biological applications, the results of this study are of considerable interest to the computational chemistry community.

In the current form, the ms can benefit from further discussion of several points:

1. The authors chose to optimize the atomic radii based entirely on element type (e.g., HCNOS). In the literature, many solvation models either further consider atom types (e.g., UAKS) or atomic charge distribution (based on either atomic point charges or charge density); in many cases, a higher degree of accuracy appears to be obtained. It would be useful to further clarify the principle behind the current optimization and the expected level of accuracy; for example, to what degree should we expect the same set of radii to work well for both neutral and ionic (especially anionic) species?
2. Although it is well known - it is useful to explicitly point out that the experimental values for neutral and charged species have different magnitudes of errors.
3. It would be informative to further dissect/discuss the physical origins for the errors of NDDO/DFTB continuum solvation models. For example, are the larger errors (as compared to, for example, HF based calculations) due primarily to the less reliable description of the solute charge density (e.g., multipole moments) or solute polarizability? Discussion along this line might be relevant to the transferability of the optimized model to non-aqueous solvents.

4. Cases with very large errors deserve further analysis/discussion - for example, some neutral solutes apparently have very large errors at HF, NDDO and DFTB levels - as much as 20 or even 30 kcal/mol! What are these molecules? Are the same set of molecules problematic for all methods? What is the physical origin for these large errors?

Reviewer #2 Evaluations:
Recommendation: Revision
New Potential Energy Surface: No

Reviewer #2 (Comments to the Author):

In this paper, the authors make the case for efficient solvation models in
fast electronic structure methods (currently heavily utilized for high-throughput
screening approaches). They extend an implementation of PM6 in the Gamess
programm to account for d orbitals. The SMD and COSMO continuum models in combination with
various semi-empirical NDDO and also DFT tight-binding approaches is considered.
Their analysis clearly highlights deficiencies of the semi-empiricial approaches
compared to HF/DFT. The authors then proceed to propose a remedy (changing the
radii for H, C, O, N, and S). Although this change was driven by data on aqueous
solvation energies, the authors find that other polar solvents (DMSO, CH3CN, CH3OH)
are also improved, which is a sign of transferability of this simple fix.
The prediction of pKa data, as an important application field, concludes the
results section. The paper is clearly written, however, it raises questions that
should be addressed in a revision:
1) Table 2 shows very (too) small Coulomb radii for H and on page 6 this is commented on.
The authors note that for radii smaller than 0.6 A the proton moved into the solvent.
However, no further analysis if provided. I assume that this is due to an increased outlying
charge and this outlying charge shoud be quantified. Apparently, some error compensation
is in operation. This also relates to the statement 'error for the ions is considerably larger
than for neutral molecules' on page 5. Error compensation also raises a concern about
transferability that the authors must address.
2) The authors should also review their list of references (I assume that the first author's
surname in Ref. 1 is misspelled, the abbreviations of journals are not in JCP style, Ref. 34
appears to lack a journal title, Ref. 16 lacks author names ...).
3) Moreover, figures 1-3 lack a label for the y-axis, figure 4 lacks units
on the y-axis.
4) Few typos need also be removed (see, e.g., "mainly only" -> "mainly on"
on page 2).

This work is licensed under a Creative Commons Attribution 4.0

Sunday, September 2, 2018

Assigning bond orders

This last week I've been working off and on changing the way xyz2mol.py assigns bond orders. xyx2mol is an implementation of this paper and works by first assigning connectivity and then increasing the bond order between atoms with unsatisfied valences. As the article points out the outcome depends on the order you do it in.

For example, if you start by adding a double bond between atoms 6 and 7 in biphenyl (1), you don't get the right result. The authors propose to start by adding double bonds between atoms with the lowest connectivity, which works fine for biphenyl.

But Mads found an example (2) where that doesn't work. According to the rules you would add double bonds between 1-7, 3-4, and 8-9, which is not correct.

I solved this by finding the longest path of atoms unsatisfied valences (7-1-2-3-4-5-8-9) and then adding double bonds between alternate pairs. If there is an uneven number of such atoms you get different results depending on where you start, but that could probably be fixed by some ad-hoc rules. However, this approach fails for benzoquinone (3), where the longest path is 7-6-1-2-3-4-5. (Finding the longest path can also become intractable for many atoms)

The approach I have settled on so far, finds the maximum number of non-consecutive double bonds. So if you have 6 or 7 atoms with unsatisfied valence you can have a maximum of 3 non-consecutive double bonds. I find this combination by generating all combinations of three bonds and determining the one with the most different atoms (e.g. [(1,2), (3,4)] is better than [(1,2), (2,3)]. This doesn't scale all that well when you're trying to place 10 or more double bonds, so if you have a better idea I'm all ears.

3.09.2018 update. Thanks to Geoff's suggestion the code now runs considerably faster

Assigning bond orders, whether in an aromatic structure or not, is a specific case of the maximal weighted matching graph algorithm. You can certainly come up with fast heuristics for certain kinds of structures, but IIRC scaling is ~N^3 in general
— Geoff Hutchison (@ghutchis) September 2, 2018