Sunday, June 9, 2019

Comparison of SMILES-, DeepSMILES-, SELFIES-, and graph-based genetic algorithms

This post is a follow up to this post. There are three main changes: 1) I have included Emilie's code in my code, 2) I have extended the implementation to SELFIES, and 3) the initial pool of molecules is now constructed exactly as described by Brown et al. (i.e. we use the 100 highest scoring molecules from ChEMBL, but remove molecules with scores higher than 0.323).

As before, I run the 10 GA searches, each for 1000 generations, and record the overall highest score found and the average high score. If the score is 1.00 I also record the number of times I found it, in parentheses. I also record the CPU time on 8 cores (note that I stop the search once the score is 1.00, so the time is not necessarily for 10 x 1000 generations).


Here are the high scoring molecules found with string based methods



Bottom line, DeepSMILES and SELFIES perform about the same, and both tend to outperform SMILES for rediscovery using GA.


This work is licensed under a Creative Commons Attribution 4.0

No comments: