Tuesday, June 16, 2020

Generating a random molecule from a chemical formula



Theo posted the following question on the RDKit mailing list

is there maybe a way with RDKit to generate random (but valid) molecules with a given chemical sumformula?
For example:
C12H9N could generate Carbazole as valid compound.
The output would be mol or SMILES.
This is actually a difficult problem, if one wants to enumerate all the possibilities, but it is not too difficult to whip up code that suggests some possibilities, though some of the suggestions may be pretty unrealistic. 

I start by generating a linear hydrocarbon with the correct number of heavy atoms. The randomly change some of the carbons to the other atoms in the molecule. If there are too many hydrogens, I introduce multiple bond and rings until the atom count is correct. Here I use some of the mutation operations from my graph based genetic algorithm.

One issue is that is it will only produce linear molecules for saturated systems. This can be fixed by adding som branching mutations, e.g. CCCC>> CC(C)C.


This work is licensed under a Creative Commons Attribution 4.0