Sunday, September 3, 2017

Automatic generation of a set of molecules

Many quantum chemistry projects have reached a point where setup and analysis consumes more human time than CPU time, e.g. it takes all day to set-up enough input files to keep the computer busy overnight. Many people use scripts to automatically extract and analyse the data, but few use scripts to generate the coordinates.

Here I show how this can easily be done using the RDKit toolkit.  The following python script adds F and OH substituents to benzene making all possible 91 combinations of mono-, di-, hexa-substituted molecules.

However, the script can easily be changed to do something else by changing parent_smiles and rxn_smarts_list in line 4 and 5.  If you are not familiar with SMILES start here and there are plenty of GUIs, such as this, that generate SMILES.

To use the Reaction SMARTS you have to learn SMARTS, which can be a bit tricky, but it is a very powerful tool. For example, if you change [cX3;H1:1]>>[*:1]F to [*;H1:1]>>[*:1]F then the program will add H to any atom with one H, i.e. also the OH group to create the OF substituent.  So it you set substitutions = 2, you'll get mono-substituted Ph-OF in addition to mono- and di-substituted Ph-F and Ph-OH.

Similarly, ff you add [cX3;H1:1][cX3;H1:2]>>[c:1]1[c:2]cccc1 to the list (and use substitutions = 2) you'll get un- and mono-substituted napthalene as well as un-substituted anthracene and phenanthrene.

In my experience, the only thing that limits what I can build with this approach is my understanding of SMARTS.  Hope this is of some use to you.

This work is licensed under a Creative Commons Attribution 4.0