Proteins and Wave Functions: Screening for large energy storage capacity of meta-stable dihydroazulenes

Thursday, January 10, 2019

Screening for large energy storage capacity of meta-stable dihydroazulenes

Here's a summary of where we are at with Mads project

The Challenge

Dihydroazulenes (DHAs) are promising candidates for storing solar energy as chemical energy, which can be released as thermal energy when needed. The ideal DHA derivative has a large $\Delta H_{rxn}$ and a $\Delta G^{\ddagger}$ that is large enough to give a half life of days to months but low enough so that the energy release can be reduced. Of course any modification should not affect light adsorption adversely. This presents an interesting optimisation challenge!

We asked Mogens Brøndsted Nielsen to come up with a list of substituents and he suggested 40 different substituents and 7 positions, which results in 164 billion different molecules (we chose to interpret the right hand figure more generally). We decided to start by looking at all single and double substitutions, which amounts to 35,588 different molecules. The first question is what level of theory will allow us to screen this many molecules.

The initial screen

At a minimum we need to compute $\Delta E_{rxn}$ which involves at least a rudimentary conformational search for both reactants and products. We used an approach similar to this study, ($5+5n_{rot}$ RDKit generated start geometries) which results in over 1 million SQM geometry optimisations, but used GFN-xTB instead of PM3 because the former is about 10 times faster.

To find the barriers, we did a 12-point scan along the breaking bond in DHA (out to 3.5Å) starting from the lowest energy DHA conformer. The highest energy structure was then used as a starting point for a TS search using Gaussian and "calcall". We used ORCA for the scan and Gaussian for the TS search, and used PM3 because it is implemented in both programs. We also optimised the lowest VHF structure with PM3 to compute the barrier. We verify the TS by checking whether the imaginary normal mode lies along the reacting bond. This worked in 32,623 cases. The whole thing took roughly 5 days using roughly 250 cores.

Note that this approach finds a TS to cis-VHF, which we assume is in thermal equilibrium with the lower energy trans-VHF form. For both barriers and reaction energies we use the electronic energy differences rather than free energies of activation and reaction enthalpies.

The next step

We can afford to do a reasonably careful (DFT/TZV) study on at most 50 molecules, so the next question is how to identify the top 50 candidates. In other words to what extent can we trust the conformational search and the xTB and PM3 energies? I plan to cover this in a future blog post.

A more efficient initial screen
We now have data with which to test more efficient ways of performing the initial screen:

1. (a) We could perform the conformational search using MMFF and only xTB-optimise the lowest energy DHA and VHF structures.
(b) We could only perform the TS search for molecules with large $ \Delta E_{rxn}$ values.
(c) We could perform the bond-scan with xTB rather than ORCA.
(d) We could test whether the bond-scan barrier can be used instead of the TS-based barrier.

2. We could test the use ML-based energy functions such as ANI-1 instead of SQM.

3. We could test whether ML can be trained to predict $\Delta E_{rxn}$ and/or $\Delta E^{\ddagger}$ based on the DHA structure.

We'd be very happy to collaborate on this.

Beyond double substitution

No matter how efficient we make the initial screen, screening all 164 billion molecules is simply not feasible. Instead we'll need to use search algorithms such as genetic algorithms or random forest.

Other ideas/comments/questions on this or anything else related to this blogpost are very welcome.