Friday 2 March 2012

It all adds up to a new descriptor

Group contribution descriptors such as TPSA and LogP are pretty popular. I've just written up some docs describing how to add a new group contribution descriptor to Open Babel. It's fairly easy to do once you have a set of SMARTS strings and contributions.
Group contribution descriptors are a common type of molecular descriptor whose value is a sum of contributions from substructures of the molecule. Such a descriptor can easily be added to Open Babel without the need to recompile the code. All you need is a set of SMARTS strings for each group, and their corresponding contributions to the descriptor value.

The following example shows how to add a new descriptor, hellohalo, whose value increments by 1, 2, 3 or 4 for each F, Cl, Br, and I (respectively) in the molecule...
To read the rest, check out the development docs.

I'm currently planning* to add a new option to make debugging of these descriptors simpler, so now might be a good time to look into implementing a descriptor if you're interested.

If you do implement a new descriptor, please consider the trees, the number of towels washed every day, and the effort you will save another PhD student, and contribute the result back to Open Babel. If you do so under a liberal license (e.g. CC0), the same SMARTS strings can also be used by other cheminformatics toolkits.

* Now done. In the development version, add ";debug" to the start of the pattern file.

No comments: