## Tuesday, July 10, 2018

### Comparing a Monte Carlo tree search and a genetic algorithm for conformational search

I've been playing around with this Monte Carlo tree search (MCTS) code (if you need a short intro to MCTS click here). I want to learn how to use MCTS to optimise molecular properties so to get started I modified the code to minimise a very simple energy function that mimics a torsional potential.
$$E = \sum_i 1 + \cos(\phi_i) + \cos(3\phi_i)$$ It seems to work OK but in order to get some point of comparison for the efficiency I tested it against a genetic algorithm (GA). You can find the codes here and here

More specifically, I tested for a function with 10-dihedral angles, each with three possible values (180.0, 60.0, and -60.0), meaning there are 59,049 possible combinations (i.e. "conformations"). Each 180 and ±60 angle contributes -1 and 0.5 to the energy so the global minimum is one in which all angles are 180 and has an energy of -10.

If you simply make 10 x 1000 random combinations of 10-dihedral angles and average the lowest energy found among each set of 1000 you get 0.35. This makes sense because the highest energy conformer has an energy of +5, but there more ways to make it, so the average should be around 0.

Using 1000 iterations (i.e. 1000 energy evaluation) the MCTS finds the global minimum 0 out of 10 times and gives an average energy of -7.6. In 4 cases the energy is -8.5 (i.e. it misses one dihedral) and -7 in the rest (i.e. it misses 2 dihedrals).

For comparison, the GA finds the global minimum 0 out of 10 times and gives an average energy of -6.85. In 2 cases the energy is -8.5, in 5 cases the energy is -7, and in the rest, -5.5. I use a population size of 50 and 20 generations, which also requires 1000 energy evaluations, and a mutation rate of 0.01. Using a population size of 20 and 50 generations results in an average energy of -6.7.

I'm actually a bit surprised that MCTS out-performs GA because I don't recall seeing MCTS being used for conformational search. One possible reason is that it's hard to see how to parallelise MCTS since energy evaluations are done sequentially, while they can be done in parallel for each generation for GA.

I did try running 10 100-iteration MCTSs, which could be done in parallel, and 1 of them managed to find a -8.5 conformer so maybe that's a possibility. Another way would to do a complete expansion of a terminal node and do the three energy evaluations in parallel. This might also be a better search strategy in any case. I am not quite sure how to implement this yet (recursion is not my strong suit) so any suggestions are welcome.

Also the reward function could probably be optimised. Currently it is simply 1 if the energy is lower than the current "global" minimum and 0 otherwise. Suggestions for other functions are most welcome.

## Saturday, July 7, 2018

### Planned papers for 2018 - six months in

In January I wrote about the papers I plan to publish in 2018 and made this list:

Accepted
1. Fast and accurate prediction of the regioselectivity of electrophilic aromatic substitution reactions

Probable
2. Random Versus Systematic Errors in Reaction Enthalpies Computed using Semi-empirical and Minimal Basis Set Methods
3. Improving Solvation Energy Predictions using the SMD Solvation Method and Semi-empirical Electronic Structure Methods

Maybe
4. Towards a barrier height benchmark set for biologically relevant systems - part 2
5. pKaSQM: Automated Prediction of pKa Values for Druglike Molecules Using Semiempirical Quantum Chemical Methods
6. Prediction of CH pKa values

The status is

Submitted

Probably submitted in 2018
4. Towards a barrier height benchmark set for biologically relevant systems - part 2

Paper 4: We've already gathered the data for 11 new systems to add to the data set and Jimmy's working on applying what we learned in paper 2 to most of these systems. We have to submit the paper in October as part of a special issue. I hope we'll make it.

I don't see papers 5 and 6 happen in the foreseeable future as we're getting further into high-throughput screening of various molecular properties.

## Friday, July 6, 2018

### Why I just sent my paper to Journal of Chemical Physics

The last few years I have published my papers (i.e. the ones where I'm corresponding author) open access as much as possible, but I just sent my latest paper to JCP. Here's why.

Why not OA?
The paper isn't "impactful" enough for free OA options such as Chemical Science and ACS Central Science.

When it comes to impact-neutral journals my main consideration is price. My go-to journal in the past has been PeerJ where I purchased a life-time membership years ago. My two co-authors also have memberships so we could have published the article for free. Unfortunately, the current paper is not "bio enough" for PeerJ.

It's also not "organic" enough for Beilstein Journal of Organic Chemistry (which is free) and it doesn't fall in to any of the categories for Living Journal of Computational Molecular Science ($100). The next-most cheapest options are ACS Omega ($750) or RSC Advances (£500 ≈ $670). When I published in ACS Omega earlier this year I discovered that you still sign away copyright to the ACS who then graciously agrees to publish it under an CC license if you pay them$750! In my book that's not OA and I don't think I'll publish there again.

I was all set to send it to RSC Advances but a few months ago I discovered that the RSC is engaging in the sort of behaviour I would normally expect from Elsevier. I don't see why I should support the RSC with an additional £500.