Tuesday, July 10, 2018

Comparing a Monte Carlo tree search and a genetic algorithm for conformational search


I've been playing around with this Monte Carlo tree search (MCTS) code (if you need a short intro to MCTS click here). I want to learn how to use MCTS to optimise molecular properties so to get started I modified the code to minimise a very simple energy function that mimics a torsional potential.
$$E = \sum_i 1 + \cos(\phi_i) + \cos(3\phi_i)$$ It seems to work OK but in order to get some point of comparison for the efficiency I tested it against a genetic algorithm (GA). You can find the codes here and here

More specifically, I tested for a function with 10-dihedral angles, each with three possible values (180.0, 60.0, and -60.0), meaning there are 59,049 possible combinations (i.e. "conformations"). Each 180 and ±60 angle contributes -1 and 0.5 to the energy so the global minimum is one in which all angles are 180 and has an energy of -10. 

If you simply make 10 x 1000 random combinations of 10-dihedral angles and average the lowest energy found among each set of 1000 you get 0.35. This makes sense because the highest energy conformer has an energy of +5, but there more ways to make it, so the average should be around 0.

Using 1000 iterations (i.e. 1000 energy evaluation) the MCTS finds the global minimum 0 out of 10 times and gives an average energy of -7.6. In 4 cases the energy is -8.5 (i.e. it misses one dihedral) and -7 in the rest (i.e. it misses 2 dihedrals).

For comparison, the GA finds the global minimum 0 out of 10 times and gives an average energy of -6.85. In 2 cases the energy is -8.5, in 5 cases the energy is -7, and in the rest, -5.5. I use a population size of 50 and 20 generations, which also requires 1000 energy evaluations, and a mutation rate of 0.01. Using a population size of 20 and 50 generations results in an average energy of -6.7.

I'm actually a bit surprised that MCTS out-performs GA because I don't recall seeing MCTS being used for conformational search. One possible reason is that it's hard to see how to parallelise MCTS since energy evaluations are done sequentially, while they can be done in parallel for each generation for GA.

I did try running 10 100-iteration MCTSs, which could be done in parallel, and 1 of them managed to find a -8.5 conformer so maybe that's a possibility. Another way would to do a complete expansion of a terminal node and do the three energy evaluations in parallel. This might also be a better search strategy in any case. I am not quite sure how to implement this yet (recursion is not my strong suit) so any suggestions are welcome.

Also the reward function could probably be optimised. Currently it is simply 1 if the energy is lower than the current "global" minimum and 0 otherwise. Suggestions for other functions are most welcome.



This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Saturday, July 7, 2018

Planned papers for 2018 - six months in

In January I wrote about the papers I plan to publish in 2018 and made this list:

Accepted
1. Fast and accurate prediction of the regioselectivity of electrophilic aromatic substitution reactions

Probable
2. Random Versus Systematic Errors in Reaction Enthalpies Computed using Semi-empirical and Minimal Basis Set Methods
3. Improving Solvation Energy Predictions using the SMD Solvation Method and Semi-empirical Electronic Structure Methods

Maybe
4. Towards a barrier height benchmark set for biologically relevant systems - part 2
5. pKaSQM: Automated Prediction of pKa Values for Druglike Molecules Using Semiempirical Quantum Chemical Methods
6. Prediction of CH pKa values

The status is


Submitted

Probably submitted in 2018
4. Towards a barrier height benchmark set for biologically relevant systems - part 2

Paper 4: We've already gathered the data for 11 new systems to add to the data set and Jimmy's working on applying what we learned in paper 2 to most of these systems. We have to submit the paper in October as part of a special issue. I hope we'll make it.

I don't see papers 5 and 6 happen in the foreseeable future as we're getting further into high-throughput screening of various molecular properties. 



This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Friday, July 6, 2018

Why I just sent my paper to Journal of Chemical Physics

Image result for journal of chemical physics

The last few years I have published my papers (i.e. the ones where I'm corresponding author) open access as much as possible, but I just sent my latest paper to JCP. Here's why.

Why not OA?
The paper isn't "impactful" enough for free OA options such as Chemical Science and ACS Central Science.

When it comes to impact-neutral journals my main consideration is price. My go-to journal in the past has been PeerJ where I purchased a life-time membership years ago. My two co-authors also have memberships so we could have published the article for free. Unfortunately, the current paper is not "bio enough" for PeerJ.

It's also not "organic" enough for Beilstein Journal of Organic Chemistry (which is free) and it doesn't fall in to any of the categories for Living Journal of Computational Molecular Science ($100).

The next-most cheapest options are ACS Omega ($750) or RSC Advances (£500 ≈ $670). When I published in ACS Omega earlier this year I discovered that you still sign away copyright to the ACS who then graciously agrees to publish it under an CC license if you pay them $750! In my book that's not OA and I don't think I'll publish there again.

I was all set to send it to RSC Advances but a few months ago I discovered that the RSC is engaging in the sort of behaviour I would normally expect from Elsevier. I don't see why I should support the RSC with an additional £500.

That's the only journals I know of with APCs less than $1000 and I simply don't have that kind of money to spend on publications right now.

Why JCP?
In March 2016, AIP Publishing switched to a new publishing agreement where authors retain copyright but give the AIP an exclusive licence to publish. As a copyright owner you are allowed to, for example

1. Reprint portions of the Work (excerpts, figures, tables) in future works created by the
Author, in keeping with professional publication ethics.
2. Post the Accepted Manuscript (AM) to their personal web page or their employer’s
web page immediately after acceptance by AIP Publishing.
3. Deposit the AM in an institutional or funder-designated repository immediately after
acceptance by AIP Publishing.
5. Reprint the Version of Record (VOR) in print collections written by the Author, or in the
Author’s thesis or dissertation. It is understood and agreed that the thesis or
dissertation may be made available electronically on the university’s site or in its
repository and that copies may be offered for sale on demand.
10. Post the VOR to their personal web page or their employer’s web page 12 months
after publication by AIP Publishing.
11. Deposit the VOR in an institutional or funder-designated repository 12 months after
publication by AIP Publishing.
12. Update a prior posting with the VOR on a noncommercial server such as arXiv, 12
months after publication by AIP Publishing.

Just to be clear, most non-OA journals allow you to post the accepted manuscript in a repository 12 months after publication, while JCP allows you to do this immediately as well as posting the version of record (i.e. the typeset manuscript!) 12-months later.

So it's not OA, because reuse is restricted, but the version of record will be openly accessible after 12 months. And at least I feel like I am dealing with a publisher that is not looking for any opportunity to screw me, or the scientific community in general, over. But maybe I'm just getting sentimental in my old age ...



This work is licensed under a Creative Commons Attribution 4.0

Monday, July 2, 2018

Gaussian Processes for dumb dummies

It's quite disheartening when you don't understand something titled "xxx for dummies" so that's what I was after reading Katherine Bailey's blogpost "Gaussian Processes for Dummies". Luckily the post included Python code and a link to some lecture notes for which I also found the corresponding recorded lecture on YouTube.

Having looked at all that, the blogpost made more sense and I now feel I understand Gaussian Processes (GP) a little bit better, at least how it applies to regression, and here is my take on it.

What is GP good for?
GP is an algorithm that takes x and y coordinates as input and returns a numerical fit and the associated standard deviation at each point.  What makes GP special is that you don't have to choose a mathematical function for the fit and you get the uncertainty of the fit at each point

I refactored the code from Katherine Baily's blogpost here and show a plot of a GP fit ("mu", red line) to five points of a $sin$ function (blue squares, "Xtrain" and "ytrain"). "Xtest" is a vector of x-coordinates for which I want to evaluate the fit, "mu". "stdev" is a vector of standard deviations of the fit at each point in Xtest and the gray shading in the plot represents 2 standard deviations of uncertainty.  We'll get to "L" later and there is a hyperparameter ("param" in the code) that we also need to talk about.

mu, stdv, L = Gaussian_process(Xtest, Xtrain, ytrain)


What does the GP algorithm do?
1. Construct the kernel matrix $\mathbf{K_{**}}$, where $K_{**,ij} = e^{-(x_i-x_j)^2/2\lambda}$, for the test set. The kernel is a measure of how similar $x_i$ is to $x_j$ and $\lambda$ is a parameter, called "param" in the code (note to self: "lambda" is a bad choice for a variable name in Python).

2. Construct the kernel matrix $\mathbf{K}$ for the test set and Cholesky-decompose it to get $\mathbf{L}$., i.e. $\mathbf{K = LL^T}$

3. Construct the kernel matrix $\mathbf{K_*}$ connecting the test set to the training set and compute $\mathbf{L_k = L^{-1}K_*}$ (use the solve function for better numerical stability).

4. Compute  $\boldsymbol{\mu} = \mathbf{ L^T_kL^{-1}y}_{train}$

5. Compute the standard deviation $s_i = \sqrt{k_{**,ii} - \sum_j L^2_{k,ij}}$ where $\mathbf{k_{**}}$ is obtained by diagonalising $\mathbf{K_{**}}$

6. Compute $\mathbf{L}$ by Cholesky-decomposing $\mathbf{K_{** }- L_k^T L_k}$

What is the basic idea behind the GP algorithm?
How would you write an algorithm that would generate an random function $y=f(x)$. I would argue that the simplest way is simply to generate a random number $y \sim \mathcal{N}(0,1)$ (i.e. y = np.random.normal()) for each value of $x$. ($ \mathcal{N}(0,1)$ is standard notation for a Gaussian distribution with 0 mean and a standard deviation of 1.)

Here's a plot of three such functions.


If I generated 1000 of them the average y-value at each x-coordinate $\langle y_i \rangle$ would be 0. I can change this by $y \sim \mathcal{N}(\mu,\sigma^2) = \mu + \sigma \mathcal{N}(0,1)$.

You'll notice that these functions are much more jagged than the functions you usually work with. Another way of saying this is that the values of $y$ tend to be similar if the corresponding values of $x$ are similar, i.e. the $y$ values are correlated by distance ($|x_i - x_j|$).

This correlation is quantified by the kernel matrix $\mathbf{K}$ and can be used to generate smoother functions by $y_i \sim \sum_j L_{ij} \mathcal{N}(0,1)$. This works great as you can see from this plot


You can think of $\mathbf{K}$ and $\mathbf{L}$ a bit like the variance $\sigma^2$ and the standard deviation $\sigma$.  $\langle y_i \rangle$ = 0 as before but this can be changed in analogy with the uncorrelated case $y_i \sim \mu + \sum_j L_{ij} \mathcal{N}(0,1)$

The GP is a way of generalising this equation as $y_i \sim \mu_i + \sum_j L_{ij} \mathcal{N}(0,1)$ and using the training data to obtain values for $\mu_i$ and $L_{ij}$ such that $\mu_i$ matches the y-values in the training data with correspondingly small $L_{ij}$ values, i.e. greater certainty. Now if you generate a 1000 such random functions and average you will get  $\boldsymbol{\mu}$.



This work is licensed under a Creative Commons Attribution 4.0