During a Twitter discussion Noel O'Boyle introduced me to Graph Edit Distance (GDE) as a useful measure of molecular similarity. The advantages over other approaches such as Tanimoto similarity is discussed in these slides by Roger Sayle.
It turns out Networkx can compute this, so it's relatively easy to interface with RDKit and the implementation is shown below.
Unfortunately, the time required for computing GDE increases exponentially with molecule size, so this implementation is not really of practical use.
Sayle's slides discusses one solution to this, but it's far from trivial to implement. If you know of other open source implementations, please let me know.
Update: GitHub page

This work is licensed under a Creative Commons Attribution 4.0
It turns out Networkx can compute this, so it's relatively easy to interface with RDKit and the implementation is shown below.
Unfortunately, the time required for computing GDE increases exponentially with molecule size, so this implementation is not really of practical use.
Sayle's slides discusses one solution to this, but it's far from trivial to implement. If you know of other open source implementations, please let me know.
Update: GitHub page
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Written by Jan H. Jensen, 2020 | |
''' | |
from rdkit import Chem | |
import networkx as nx | |
def get_graph(mol): | |
Chem.Kekulize(mol) | |
atoms = [atom.GetAtomicNum() for atom in mol.GetAtoms()] | |
am = Chem.GetAdjacencyMatrix(mol,useBO=True) | |
for i,atom in enumerate(atoms): | |
am[i,i] = atom | |
G = nx.from_numpy_matrix(am) | |
return G | |
mol1 = Chem.MolFromSmiles('c1ccccc1') | |
#mol2 = Chem.MolFromSmiles('c1cnccc1') | |
mol2 = Chem.MolFromSmiles('C=CC=CC=C') | |
G1 = get_graph(mol1) | |
G2 = get_graph(mol2) | |
GDE = nx.graph_edit_distance(G1, G2, edge_match=lambda a,b: a['weight'] == b['weight']) | |
print(GDE) |

This work is licensed under a Creative Commons Attribution 4.0