Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate 2d images of molecules from PubChem FTP data

Rather than crawl PubChem's website, I'd prefer to be nice and generate the images locally from the PubChem ftp site:

ftp://ftp.ncbi.nih.gov/pubchem/specifications/

The only problem is that I'm limited to OSX and Linux and I can't seem to find a way of programmatically generating the 2d images that they have on their site. See this example:

https://pubchem.ncbi.nlm.nih.gov/compound/6#section=Top

Under the heading "2D Structure" we have this image here:

https://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=6&t=l

That is what I'm trying to generate.

like image 266
zachaysan Avatar asked Sep 17 '15 14:09

zachaysan


2 Answers

If you want something working out of the box I would suggest using molconvert from ChemAxon's Marvin (https://www.chemaxon.com/products/marvin/), which is free for academics. It can be used easily from the command line and it supports plenty of input and output formats. So for your example it would be:

molconvert "png" -s "C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl" -o cdnb.png

Resulting in the following image:

1-chloro-2,4-dinitrobenzene

It also allows you to set parameters such as width, height, quality, background color and so on.


However, if you are a programmer I would definitely recommend RDKit. Follows a code which generates images for a pair of compounds given as smiles.

from rdkit import Chem
from rdkit.Chem import Draw

ms_smis = [["C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl", "cdnb"],
           ["C1=CC(=CC(=C1)N)C(=O)N", "3aminobenzamide"]]
ms = [[Chem.MolFromSmiles(x[0]), x[1]] for x in ms_smis]

for m in ms: Draw.MolToFile(m[0], m[1] + ".svg", size=(800, 800))

This gives you following images:

cdnbenter image description here

like image 182
David Hoksza Avatar answered Oct 23 '22 18:10

David Hoksza


So I also emailed the PubChem guys and they got back to me very quickly with this response:

The only bulk access we have to images is through the download service: https://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi
You can request up to 50,000 images at a time.

Which is better than I was expecting, but still not amazing since it requires downloading things that I in theory could generate locally. So I'm leaving this question open until some kind soul writes an open source library to do the same.

Edit:

I figure I might as well save people some time if they are doing the same thing as I am. I've created a Ruby Gem backed on Mechanize to automate the downloading of images. Please be kind to their servers and only download what you need.

https://github.com/zachaysan/pubchem

gem install pubchem

like image 35
zachaysan Avatar answered Oct 23 '22 16:10

zachaysan