Wednesday 7 November 2012

Tricks with SMILES and SMARTS

Because of the relationship between SMILES and SMARTS, there are some fun tricks you can do (for some value of fun). For example, over at the NextMove blog I have written about creating a substructure hierarchy (here and here).

Here's another example I came up with in response to a recent question on the OB mailing list from Pascal Muller:
Having a molecule (let's say ethylpyridine CCc1cccnc1) and its scaffold (pyridine c1ccncc1), I would like to create a generic scaffold (smarts) for substructure searches: considered atoms become "any" atom (*), and bonds becomes "any" bond (~).
I.e., the smarts should be  CC*~1~*~*~*~*~*~1 (parts not belonging to the scaffold don't change).

Is there a way in Pybel to mutate atom / bond into "*~" apart from string replacement in the smiles? I anticipate problems with brackets doing so.
And my reply:
Atoms with atomic number 0 are written as *. Bonds with BO 4 are
written as $. So...the following hack may work for you in most cases.
:-)

>>> import pybel
>>> mol = pybel.readstring("smi", "CC(=O)Cl")
>>> mol.atoms[0].OBAtom.SetAtomicNum(0)
>>> print mol
*C(=O)Cl
>>> mol.atoms[2].OBAtom.SetAtomicNum(0)
>>> print mol
*C(=*)Cl
>>> bond = mol.OBMol.GetBond(2, 3)
>>> bond.GetBO()
2
>>> bond.SetBO(4)
>>> print mol
*C($*)Cl
>>> print mol.write("smi").replace("$", "~")
*C(~*)Cl
It probably isn't a general solution, but tricks like these can go a long way to solving a problem in many cases.

No comments: