Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Morgan fingerprint rdkit

Working in an example I realized that there are at least two ways of computing morgan fingerprints for a molecule using rdkit. But using the exact same properties in both ways I get different vectors. Am I missing something?

First approach:

info = {}
mol = Chem.MolFromSmiles('C/C1=C\\C[C@H]([C+](C)C)CC/C(C)=C/CC1')
fp = AllChem.GetMorganFingerprintAsBitVect(mol, useChirality=True, radius=2, nBits = 124, bitInfo=info)
vector = np.array(fp)
vector
array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1])

Second approach:

morgan_fp_gen = rdFingerprintGenerator.GetMorganGenerator(includeChirality=True, radius=2, fpSize=124)
mol = Chem.MolFromSmiles('C/C1=C\\C[C@H]([C+](C)C)CC/C(C)=C/CC1')
fp = morgan_fp_gen.GetFingerprint(mol)
vector = np.array(fp)
vector
array([0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0,
       1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0])

Which are clearly different, even using chirality in both cases.

Besides, there is a way to get the bitInfo from a bit vector using the second approach?

like image 990
Fence Avatar asked Nov 07 '22 15:11

Fence


1 Answers

By default the Morgan Generator uses "count simulation": adding extra bits to a bit vector fingerprint in order to get bit-vector similarities. If you turn this off by passing useCountSimulation=False the fingerprints should be equivalent:

mol = Chem.MolFromSmiles('C/C1=C\\C[C@H]([C+](C)C)CC/C(C)=C/CC1')
fp1 = AllChem.GetMorganFingerprintAsBitVect(mol, useChirality=True, radius=2, nBits=124)
vec1 = np.array(fp1)

morgan_fp_gen = rdFingerprintGenerator.GetMorganGenerator(includeChirality=True, radius=2, fpSize=124, useCountSimulation=False)
fp2 = morgan_fp_gen.GetFingerprint(mol)
vec2 = np.array(fp2)

assert np.all(vec1 == vec2) == True

as for the bitInfo I am not sure this can be done with the second method, although someone may correct me

like image 142
Oliver Scott Avatar answered Nov 15 '22 12:11

Oliver Scott