Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I talk to UniProt over HTTP in Python?

I'm trying to get some results from UniProt, which is a protein database (details are not important). I'm trying to use some script that translates from one kind of ID to another. I was able to do this manually on the browser, but could not do it in Python.

In http://www.uniprot.org/faq/28 there are some sample scripts. I tried the Perl one and it seems to work, so the problem is my Python attempts. The (working) script is:

## tool_example.pl ##
use strict;
use warnings;
use LWP::UserAgent;

my $base = 'http://www.uniprot.org';
my $tool = 'mapping';
my $params = {
  from => 'ACC', to => 'P_REFSEQ_AC', format => 'tab',
  query => 'P13368 P20806 Q9UM73 P97793 Q17192'
};

my $agent = LWP::UserAgent->new;
push @{$agent->requests_redirectable}, 'POST';
print STDERR "Submitting...\n";
my $response = $agent->post("$base/$tool/", $params);

while (my $wait = $response->header('Retry-After')) {
  print STDERR "Waiting ($wait)...\n";
  sleep $wait;
  print STDERR "Checking...\n";
  $response = $agent->get($response->base);
}

$response->is_success ?
  print $response->content :
  die 'Failed, got ' . $response->status_line . 
    ' for ' . $response->request->uri . "\n";

My questions are:

1) How would you do that in Python?

2) Will I be able to massively "scale" that (i.e., use a lot of entries in the query field)?

like image 550
R S Avatar asked Apr 03 '09 20:04

R S


4 Answers

question #1:

This can be done using python's urllibs:

import urllib, urllib2
import time
import sys

query = ' '.join(sys.argv)   

# encode params as a list of 2-tuples
params = ( ('from','ACC'), ('to', 'P_REFSEQ_AC'), ('format','tab'), ('query', query))
# url encode them
data = urllib.urlencode(params)    
url = 'http://www.uniprot.org/mapping/'

# fetch the data
try:
    foo = urllib2.urlopen(url, data)
except urllib2.HttpError, e:
    if e.code == 503:
        # blah blah get the value of the header...
        wait_time = int(e.hdrs.get('Retry-after', 0))
        print 'Sleeping %i seconds...' % (wait_time,)
        time.sleep(wait_time)
        foo = urllib2.urlopen(url, data)


# foo is a file-like object, do with it what you will.
foo.read()
like image 98
vezult Avatar answered Nov 15 '22 10:11

vezult


You're probably better off using the Protein Identifier Cross Reference service from the EBI to convert one set of IDs to another. It has a very good REST interface.

http://www.ebi.ac.uk/Tools/picr/

I should also mention that UniProt has very good webservices available. Though if you are tied to using simple http requests for some reason then its probably not useful.

like image 45
niallhaslam Avatar answered Nov 15 '22 10:11

niallhaslam


Let's assume that you are using Python 2.5. We can use httplib to directly call the web site:

import httplib, urllib
querystring = {}
#Build the query string here from the following keys (query, format, columns, compress, limit, offset)
querystring["query"] = "" 
querystring["format"] = "" # one of html | tab | fasta | gff | txt | xml | rdf | rss | list
querystring["columns"] = "" # the columns you want comma seperated
querystring["compress"] = "" # yes or no
## These may be optional
querystring["limit"] = "" # I guess if you only want a few rows
querystring["offset"] = "" # bring on paging 

##From the examples - query=organism:9606+AND+antigen&format=xml&compress=no
##Delete the following and replace with your query
querystring = {}
querystring["query"] =  "organism:9606 AND antigen" 
querystring["format"] = "xml" #make it human readable
querystring["compress"] = "no" #I don't want to have to unzip

conn = httplib.HTTPConnection("www.uniprot.org")
conn.request("GET", "/uniprot/?"+ urllib.urlencode(querystring))
r1 = conn.getresponse()
if r1.status == 200:
   data1 = r1.read()
   print data1  #or do something with it

You could then make a function around creating the query string and you should be away.

like image 1
Andrew Cox Avatar answered Nov 15 '22 11:11

Andrew Cox


check this out bioservices. they interface a lot of databases through Python. https://pythonhosted.org/bioservices/_modules/bioservices/uniprot.html

conda install bioservices --yes
like image 1
O.rka Avatar answered Nov 15 '22 10:11

O.rka