Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Python to ask a web page to run a search

Tags:

python

search

web

I have a list of protein names in the "Uniprot" format, and I'd like to convert them all to the MGI format. If you go to www.uniprot.org and type the uniprot protein name into the "Query" bar, it will generate a page with a bunch of information about that protein, including its MGI name (albeit much further down the page).

For example, one Uniprot name is "Q9D880", and by scrolling down, you can see that its corresponding MGI name is "1913775".

I already know how to use Python's urllib to extract the MGI name from a page once I get to that page. What I don't know how to do is write Python code to get the main page to run a query of "Q9D880". My list contains 270 protein names, so it would be nice to avoid copying&pasting each protein name into the Query bar.

I saw the "Google Search from a Python App" post, and I have a firmer understanding of this concept, but I suspect that running a google search is different from running the search function on some other website, like uniprot.org.

I'm running Python 2.7.2, but I'm open to implementing solutions that use other versions of Python. Thanks for the help!

like image 749
Uncle_Dick Avatar asked Dec 19 '12 22:12

Uncle_Dick


2 Answers

Easier way to do this is with the requests library. My solution for you also grabs the information itself from the page using BeautifulSoup4.

All you'd have to do, given a dictionary of your query parameters, is:

from bs4 import BeautifulSoup as BS
for protein in my_protein_list:
    text = requests.get('http://www.uniprot.org/uniprot/' + protein).text
    soup = BS(text)
    MGI = soup.find(name='a', onclick="UniProt.analytics('DR-lines', 'click', 'DR-MGI');").text
    MGI = MGI[4:]
    print protein +' - ' + MGI
like image 101
jdotjdot Avatar answered Oct 13 '22 01:10

jdotjdot


Running the search appears to do a GET on

http://www.uniprot.org/?dataset=uniprot&query=Q9D880&sort=score&url=&lucky=no&random=no

Which eventually redirects you to

http://www.uniprot.org/uniprot/Q9D880

So you should be able to use urllib or an http library (I use httplib2) to do a GET on that address, parameterizing the protein name in the URL so you can search for whichever protein name you want.

like image 24
Silas Ray Avatar answered Oct 13 '22 01:10

Silas Ray