## Friday, November 6, 2020

### Mapping atoms to reactants in products made with reaction SMARTS

Reaction SMARTS is an RDKit feature that is very useful for generating reactant products pairs for a given reaction. Unfortunately the algorithm changes the atom order between reactants and products, which creates problems one tries to locate the reaction paths using an interpolation-based algorithm such as nudged elastic band (NEB).

Fortunately, RDKit keeps track of the the change in atom order (thanks to my MS student Julius Seumer for the tip!) and it's easy to reorder the atoms:

def reorder_product(product):
reorder_inverse = [int(atom.GetProp('react_atom_idx')) for atom in product.GetAtoms()]
reorder = len(reorder_inverse)*[0]

for i in range(len(reorder_inverse)):
reorder[reorder_inverse[i]] = i

product = Chem.RenumberAtoms(product, reorder)

return product

If the 3D structures are to be used for interpolation it is important to embed the reactant structure before converting it to product. This keeps the "label-chirality" of groups such as CH2 the same in reactant and products.

Here is a demo notebook

## Saturday, October 31, 2020

### atom_mapper: matching all atoms in reactants and products

It's almost 3 years ago that I wrote about atom mapper and now the code has received a major overhaul thanks to my PhD student Mads Koerstz. The 2D atom mapping part is basically left untouched, but the main new thing is a general implementation of the 3D mapping problem.

The 3D mapping problem is that if labels (i.e. atom orders) are considered then all tetrahedral centers are chiral and the chirality of centers with equivalent atoms, such as CH2 groups, generated by RDKit's embed function will be arbitrary and unlikely to match in reactants and products. This creates problems for methods such as nudged elastic band (NEB) that try to determine the reaction path by interpolation.

Mads found a clever approach using chiral atom tags, where he generates arbitrary tags for the reactant and makes sure the tags match in the product. If there are true chiral centers he also generates all enantiomers.

The old version had some code that tried to align the coordinates, but that has been removed since that can be done much better with xTBs reaction path method.

## Tuesday, June 16, 2020

### Generating a random molecule from a chemical formula

Theo posted the following question on the RDKit mailing list

is there maybe a way with RDKit to generate random (but valid) molecules with a given chemical sumformula?
For example:
C12H9N could generate Carbazole as valid compound.
The output would be mol or SMILES.
This is actually a difficult problem, if one wants to enumerate all the possibilities, but it is not too difficult to whip up code that suggests some possibilities, though some of the suggestions may be pretty unrealistic.

I start by generating a linear hydrocarbon with the correct number of heavy atoms. The randomly change some of the carbons to the other atoms in the molecule. If there are too many hydrogens, I introduce multiple bond and rings until the atom count is correct. Here I use some of the mutation operations from my graph based genetic algorithm.

One issue is that is it will only produce linear molecules for saturated systems. This can be fixed by adding som branching mutations, e.g. CCCC>> CC(C)C.

## Sunday, January 19, 2020

### Computing Graph Edit Distance between two molecules using RDKit and Networkx

During a Twitter discussion Noel O'Boyle introduced me to Graph Edit Distance (GDE) as a useful measure of molecular similarity. The advantages over other approaches such as Tanimoto similarity is discussed in these slides by Roger Sayle.

It turns out Networkx can compute this, so it's relatively easy to interface with RDKit and the implementation is shown below.

Unfortunately, the time required for computing GDE increases exponentially with molecule size, so this implementation is not really of practical use.

Sayle's slides discusses one solution to this, but it's far from trivial to implement. If you know of other open source implementations, please let me know.

Update: GitHub page

## Saturday, January 18, 2020

### Open access chemistry publishing options in 2020

Here is an updated list of affordable impact neutral and other select OA publishing options for chemistry

Impact neutral journals
$0 (in 2020) PeerJ chemistry journals. Open peer review. (Disclaimer I am an editor for PeerJ Physical Chemistry) \$638 (normally \$850) Results in Chemistry. Closed peer review$1000 F1000Research. Open peer review. Bio-related

$1095 PeerJ - Life and Environment. Open peer review. Bio-related. PeerJ also has a membership model, which may be cheaper than the APC.$1250 ACS Omega. Closed peer review. WARNING: not real OA. You still sign away your copyright to the ACS.

(The RSC manages "the journal’s chemistry section by commissioning articles and overseeing the peer-review process")

$1350 Cogent Chemistry. Has a "pay what you can" policy. Closed peer review.$1595 PLoS ONE. Closed peer review.

$1790 Scientific Reports. Closed peer review Free or reasonably priced journals that judge perceived impact$0 Chemical Science Closed peer review

$0 CSS Chemistry Closed peer review$0 Beilstein Journal of Organic Chemistry. Closed peer review.

$0 Beilstein Journal of Nanotechnology. Closed peer review.$0 ACS Central Science. Closed peer review. ($500-1000 for CC-BY, WARNING: not real OA. You still sign away your copyright to the ACS as far as I know)$100 Living Journal of Computational Molecular Science. Closed peer review

€500 Chemistry2. Closed peer review.

£750 RSC Advances. Closed peer review.

Let me know if I have missed anything.

## Wednesday, August 14, 2019

### Machine Learning Basics

The Faculty of Science maintains a list of research presentations that high school classes can choose from when planning a visit. The description of the talk can include links to material the students and use to prepare and keep working on after the visit. So I made a series of video lectures about machine learning and Python for people with no other background than high school level mathematics.

I hope to add more videos/topics as I find the time and I hope this will get some of the students interested in programming and machine learning.

## Tuesday, July 16, 2019

### Planned papers for 2019 - six months in

In January I wrote about the papers I plan to publish in 2019 and made this list:

Submitted
1. Graph-based Genetic Algorithm and Generative Model/Monte Carlo Tree Search for the Exploration of Chemical Space

Probable
2. Screening for energy storage capacity of meta-stable vinylheptafulvenes
3. Testing algorithms for finding the global minimum of drug-like compounds
4. Towards a barrier height benchmark set for biologically relevant systems - part 2
5. SMILES-based genetic algorithms for chemical space exploration

Maybe
6. Further screening of bicyclo[2.2.2]octane-based molecular insulators
7. Screening for electronic properties using a graph-based genetic algorithm
8. Further screening for energy storage capacity of meta-stable vinylheptafulvenes

Six months later the status is:

Accepted

Probably submitted in 2019
While we could certainly have gotten this version published, we decided to write an even better paper were we screen all 200 billion molecules and make an even better ML-learning model. We're almost done with the additional calculations.

5. SMILES-based genetic algorithms for chemical space exploration
The calculations are basically done (here, here, and here) and I just started working on the paper now.

3. Testing algorithms for finding the global minimum of drug-like compounds
The coding is basically done and I started generating data for a paper, but then decided on working on paper 5. This paper is next.

I think that'll be it for 2019. I went on to the 2nd round for a research center application and had to write a big proposal, so I got behind on paper writing in the Spring. I also decided to spend more time on making excuses :).