## Sunday, May 7, 2017

### Predicting pKa values using PM3 - new code and conformer search

Disclaimer: These are preliminary results and may contain errors

The code I used in my previous pKa prediction paper had some limitations that I have now mostly removed: The setup of protonation states is automated and includes acids like acetic acid and phenol, there is a rudimentary tautomer generations, and the reference molecules are chosen a bit more systematically.

I ran the new code on the same set of molecules and saw a few differences.  Some were expected and due to different references and overall protonation states but most major differences were due to different conformations even though I used the same code to generate conformers. This let me to look at the effect of conformer generation.

The original study used 20 MMFF-minimised conformations ("20mm") generated using RDKit as starting points for PM3/COSMO minimisations.  Now I've tried skipping the MMFF minimisation ("nomm"), 50 conformations with and without MMFF minimisation, and a scheme where I MMFF-minimise 50 or 100 conformations and select conformers that have energies within either 10 or 20 kcal/mol of the lowest energy ("XecutY") for each protonation state/tautomer.

The RMSEs are very similar, but 50nomm has the fewest number of cases where the error in the pKa value (ΔpKa) is greater than 1 pH unit. However, the major outlier for 20mm and 20nomm is Sparteine for which 20mm and 20nomm actually find a conformer that is 4.9 kcal/mol lower than for 50nomm, for the protonated state.

This lead me to look at the energies themselves. The numbers labelled ΔE in the table are computed as follows. For a given molecule I find the lowest energy for the protonated and deprotonated state for each method and identify the method with the lowest energy $E_{min}$. Then I count the number of molecules for which the difference to $E_{min}$ is greater than 1.36 kcal/mol and average that number for protonated and deprotonated to get one number per method.

So there are 4 molecules for which 50nomm doesn't something close to the "global" minimum for either the protonated or deprotonated state, or both. In principle, I should go on and try 100nomm to see if I can "converge", but that's starting to become pretty expensive and the point is trying to develop a practically useful tool.

The 100ecut20 results suggest that the MM gas phase energetics don't represent the PM3/COSMO energetics all that well.  So it would be interesting to try a MM conformational search with an implicit solvent model. RDKit can't do this, but Macrmodel and NAMD should be able to do this.  It also looks like the next release of OpenMM will have this capability.

Suggestions and comments most welcome.