I'm using RDKit and trying to check molecules for exact match.
After using Chem.MolFromSmiles()
the expression m == p
apparently doesn't lead to the desired result.
Of course, I can check whether p
is a substructure of m
and whether m
is a substructure of p
. But to me this looks too complicated. I couldn't find or overlooked a code example for exact match in the RDKit-documentation. How do I do this correctly? Thank you for hints.
Code:
from rdkit import Chem
myPattern = 'c1ccc2c(c1)c3ccccc3[nH]2' # Carbazole
myMolecule = 'C1=CC=C2C(=C1)C3=CC=CC=C3N2' # Carbazole
m = Chem.MolFromSmiles(myMolecule)
p = Chem.MolFromSmiles(myPattern)
print(m == p) # returns False, first (unsuccessful) attempt to check for identity
print(m.HasSubstructMatch(p)) # returns True
print(p.HasSubstructMatch(m)) # returns True
print(m.HasSubstructMatch(p) and p.HasSubstructMatch(m)) # returns True, so are the molecules identical?
Generates hashed bit-based fingerprints for an input RDKit Mol column and appends them to the table. Several fingerprint types are available. Not all settings are used for each type. Settings that are not supported by a fingerprint type will be disabled/hidden and will have no effect.
RDKit is an open source toolset used in cheminformatics. It features the following: Business-friendly BSD license. Core data structures and algorithms in C++
RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python. BSD license - a business friendly license for open source. Core data structures and algorithms in C++ Python 3.x wrapper generated using Boost.Python.
To check if two different SMILES represent the same molecule you can canonicalize the SMILES.
from rdkit import Chem
myPattern = 'c1ccc2c(c1)c3ccccc3[nH]2'
myMolecule = 'C1=CC=C2C(=C1)C3=CC=CC=C3N2'
a = Chem.CanonSmiles(myPattern)
b = Chem.CanonSmiles(myMolecule)
print(a)
'c1ccc2c(c1)[nH]c1ccccc12'
print(b)
'c1ccc2c(c1)[nH]c1ccccc12'
print(a==b)
True
My RDKit knowledge isn't great and their documentation is famously terrible but I have done this kind of thing myself. A (perhaps over-engineered) method would be to generate a graph with networkx and just compare the nodes and edges.
This is surprisingly simple, using rdkit to read the file/smiles string then just generate the topology on the fly. If you generate an rdkit_mol object from a smiles string as you have above, you would then do:
import networkx as nx
def topology_from_rdkit(rdkit_molecule):
topology = nx.Graph()
for atom in rdkit_molecule.GetAtoms():
# Add the atoms as nodes
topology.add_node(atom.GetIdx())
# Add the bonds as edges
for bonded in atom.GetNeighbors():
topology.add_edge(atom.GetIdx(), bonded.GetIdx())
return topology
def is_isomorphic(topology1, topology2):
return nx.is_isomorphic(topology1, topology2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With