Thursday 28 June 2012

Drawing vector diagrams with Python

What's the easiest way to draw nice vector diagrams with Python? I was asked this question some time ago by a bioinformatician who was moving from Perl to Python, and previously used the gd library. gd was widely used back in the day, but looks a bit clunky now.

Here's the solution I came up with, to use SVG through PySVG. It's pure-Python and so doesn't require any extra libraries, which is always a pain when dealing with graphics.

I used the code below to generate an SVG file, which was converted to the above PNG with Inkscape:
Anyone else got any suggestions?

Wednesday 27 June 2012

How to store stereochemistry in Mol files II

Following on from earlier work, I decided to resurrect Open Babel's support for storing stereo in 0D Mol files. I always liked this idea, but at the time, I removed it before release because no-one else did. Well, now Craig James tells me he likes it too, so it's back in and will be available in OB 2.3.2.

Simply put, in OB 2.3.2 tetrahedral and cis/trans stereo can be roundtripped through a 0D Mol file by using an extension to the defacto standard, as follows:

obabel -:"I/C=C/C[C@](F)(Br)Cl" -omol | obabel -imol -osmi
I/C=C/C[C@](F)(Br)Cl

With the current release, cis/trans stereo is only preserved if you generate 2D coordinates ("--gen2D") although tet stereo can be read from the chiral flags for 0D files if you specify "-as".

How does the extension work? Well, for tet stereo it just uses the chiral flags (in defiance of the spec which says to ignore chiral flags on reading - take that spec!!). For cis/trans stereo it uses Up/Down markings equivalent to those used by SMILES; all of the (at most) 4 stereobonds are given Up/Down markings; if two bonds at either end are both Up or both Down this implies cis.

A better way to do it would have been to use a double bond flag equivalent to the chiral flag - e.g. 1 means that the first two bonds that appear in the bond section are cis - this would be both easier to compute and to interpret; however it would push the spec a bit too far.

Thursday 7 June 2012

Holy moley - A blessed ordering of atoms

I know I shouldn't cast the first stone but sometimes only a saint could turn the other cheek.

Here's some background from Wikipedia:
In computer science, canonicalization (...also sometimes standardization or normalization) is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form.
Not to be confused with Canonization...
Canonization (or canonisation) is the act by which a Christian church declares a deceased person to be a saint, upon which declaration the person is included in the canon, or list, of recognized saints.
...and here are some papers from JCICS:
MOLGEN-CID - A Canonizer for Molecules and Graphs Accessible through the Internet 
The Signature Molecular Descriptor. 4. Canonizing Molecules Using Extended Valence Sequences

A search of ACS publications shows seven references to canonizer in total, and 32 references to canonization.

Okay, so I'm a native speaker, and yes, I mispell words too. I still think it's funny. :-)

Wednesday 6 June 2012

Arrr, I be at the EBI

I am currently visiting the EMBL-EBI (courtesy of Christoph Steinbeck), and listening to the sound of an English summer pelting the roof like no-one's business. If you're around (or even a square) and interested in meeting up to discuss something cheminformaticky, feel free to drop me a line. (Update: I'm back home now)