Thursday 27 November 2008

Does my PNG look bad in this (journal)?

Did you ever submit a carefully-crafted image to a journal only to find that it looks terrible in the proofs from the publisher? This happened to me recently, and it took several iterations with the publisher to figure out what was happening.

The problem turned out to be transparency in PNGs. Basically, when you export a PNG from Inkscape, the background is left transparent and, more importantly, shading (e.g. to remove the jaggedness of diagonal lines) is done by mixing in a transparent layer called the alpha channel. Despite PNGs being around for years, it was only a couple of years ago that Internet Explorer finally got with the programme and handled transparency in PNGs. However, it seems it's still a problem with some publishers.

So, how to "fix" it? Open in Gimp, "Layer", "Transparency", "Remove Alpha Channel", "File", "Save". If you don't have an alpha channel, then the "Remove Alpha Channel" option will not be available.

Update 28/Nov/08: Pierre Lindenbaum suggested an alternative remedy: you can just draw a big white rectangle behind your image in Inkscape and everything will be okay. Paweł Szczęsny points out a better solution: in Inkscape, go to "File", "Document Properties", "Page", "Background", "RGB" and set the four values to 255 (white background, no alpha transparency).

Saturday 8 November 2008

Of OChRe, OSRA and OASA (but not OSCAR)

The field of Optical Chemical Recognition (OChRe) has been around a while with a number of well-established players (see Antony William's post on the subject). There's just a single open source program though, OSRA by Igor Filippov of the NIH, which was released in July 2007. Given an image containing one or several chemical structures, OSRA returns the SMILES string for the structures.

The dependencies of OSRA give an insight into the term "open source ecosystem". Rather than reinvent the wheel, OSRA makes use of open source libraries for optical character recognition (OCRAD, GOCR), bitmap to vector image conversion (POTRACE), messing about with images (ImageMagick, GREYstoration, ThinImage, CImg) and of course, cheminformatics (it uses either OpenBabel or RDKit). Luckily for Windows users, Igor provides a compiled version.

Back in July, I emailed Igor to request a new feature, the ability to output an SDF file containing the coordinates taken from the image. Three hours later he replied that he'd added it. And now, only four months later, I've gotten around to testing it (insert excuse here)...

Anyway, one nice way of testing conversion code is by roundtripping (or "There And Back Again"). So I took the now legendary depiction faceoff test file (see, for example, here), used the coordinates therein to create a PNG image using OASA (via Pybel or cinfony), ran OSRA on the resulting image to get some coordinates, and then used those coordinates to generate a PNG again. By eyeballing the two PNG images, it's possible to discover errors. So, here are the results of the OChRe.

Notes:
(1) If you notice any trends in the errors, comment below and Igor might fix them.
(2) Where there are missing images after the OChRe, this is where OSRA missed a bond (probably reasonably), generated two molecules, and caused OASA a headache (it only handles single molecules).

Here's the code:
import pybel
import popen2

odir = "images"
for mol in pybel.readfile("sdf", "onecomponent.sdf"):
    title = mol.title
    print title
    mol.draw(usecoords=True, show=False, filename=os.path.join(odir, "%s_oasa.png" % title))
    o, i, e = popen2.popen3("../osra-trunk/osra -f sdf %s/%s_oasa.png" % (odir, title))
    osrasdf = o.read()
    newmol = pybel.readstring("sdf", osrasdf)
    try:
        newmol.draw(usecoords=True, show=False, filename=os.path.join(odir, "%s_osra.png" % title))
    except AssertionError:
        print "Unconnected!"

Image credit: Jason.Hudson

Friday 7 November 2008

Food for thought? Citations vs Accesses

Checking out the latest article in Chemistry Central Journal, I was surprised to see that in just 5 days after publication it had 5000 accesses. Today after 8 days, it has around 9000. This already makes it the most accessed paper ever from that journal. The article in question is by DP Naughton and A Petroczi, "Heavy metal ions in wines: meta-analysis of target hazard quotients reveal health risks".

At first, I couldn't figure out what was going on. A quick google, however, shows that the article has been picked up by several bloggers and news sites chasing the "wine is bad for you" angle. I also noticed that the paper is top of the BioMedCentral most viewed articles in the last 30 days. Of the other top 10, 6 are on nutrition or food-related topics.

I guess that this highlights a difference between the number of citations and the number of accesses. The number of citations reflects the impact of the paper within the scientific community, while the number of accesses also includes the impact within the broader community, arguably a more valid measure of its importance :-). Of course, for closed access journals only scientists will be able to access the papers and so the difference will not be so great (except in order of magnitude, that is).