Tuesday 20 May 2008

Cheminformatics toolkit face-off - Depiction Part 2

[See update (28/10/08).]

One of the aims of the previous toolkit face-off was to get some feedback on the best options for drawing images with different toolkits. I also hoped that a comparison of the images would allow bugs to be easily identified. And finally, I thought that a bit of competition might help improve depiction and structure diagram generators across the board.

Since the last post:
  • Molinspiration have approached me to be included in the face-off
  • I've learnt that I should remove hydrogens from CDK and OASA depictions and control the size of the generated image
  • Beda has done amazing work enhancing depiction in OASA, and has in fact now released OASA separately from BKChem as an independent cheminformatics toolkit
  • Greg is just about to release a new version of the RDKit which has improved depiction, and he has fixed the bug identified by PMR
  • And I've fixed some bugs myself in cinfony
Now the images themselves (same dataset as before): [depiction] and [structure diagram generation].

Notes:
  1. The face-off involves the open source toolkits the CDK, OASA and the RDKit, and the proprietary toolkits Cactvs and molinspiration.
  2. If you want to help improve these depictions or coordinate generators, why not leave a comment below suggesting specific ways to improve them, or highlighting specific things they could do better.
  3. Double bond stereochemistry doesn't seem to be preserved by the CDK, but this is possibly my fault (I'm awaiting a reply to an email to the cdk-users mailing list)
  4. Several people complained to me last time that I didn't give OASA a fair chance. In order to make it up, this time round I've made OASA the star attraction. The coordinates generated by the three open source toolkits are all depicted by OASA

2 comments:

Rich Apodaca said...

Noel, very well done - and very helpful.

In your work with Structure Diagram Generation, did you find anything like a quantitative analysis of the "goodness" of a 2D layout in any of the libraries?

In other words, a (separable) piece of code that looks at arbitrary 2D coordinates and assigns a score based on deviations from ideal bond angles, collisions with other atoms, and other criteria?

In that case, you could use this code to rank the quality of 2D coordinate generation and compare it with eyeballing.

Such code could also be useful in developing new 2D coordinate layout algorithms.

Noel O'Boyle said...

The short answer is no, but for sure, one could imagine such a piece of software. Perhaps some of the toolkits themselves could comment on this.