Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A python script that automatically input some text in a website and get its source code

I am doing biomedical named extraction using Python.

Now I have to cross check the results from inputting the text to http://text0.mib.man.ac.uk/software/geniatagger/ and parse the source code of the HTML text that I get after submitting text into it.

I want that the same thing to be done in my GUI itself i.e. it input from GUI that I have made and submit the text into this website and get the source code so that for cross checking I don't have to visit each time from the browser.

Thanks in advance

like image 335
Md Faisal Avatar asked Nov 29 '25 00:11

Md Faisal


1 Answers

Actually, this is a great question!

First thing you have to do is to explore a source code of the website a little bit. If you look at the source code of the website you see this block of code

<form method="POST" action="a.cgi">
<p>
Please enter a text that you want to analyze.
</p>
<p>
<textarea name="paragraph" rows="15" cols="80" wrap="soft">
... some text here ...
### This is a sample. Replace this with your own text.

</textarea>
</p>
<p>
<input type="submit" value="Submit Text" />
<input type="reset" />
</p>
</form>

What you see is that request is send to a.cgi address, since we are already on address

http://text0.mib.man.ac.uk/software/geniatagger/

The data we want to send will be send to address concatenated with this one

http://text0.mib.man.ac.uk/software/geniatagger/a.cgi

But what are we going to send there? We need a data, data are send as "paragraph" POST parameter, you see that since form has attribute method with value POST, and name of textarea is "paragraph"

We open this using this python code

import urllib
import urllib2

text =  """
        Further, while specific constitutive binding to the peri-kappa B site is seen in monocytes, stimulation with phorbol esters induces additional, specific binding. Understanding the monocyte-specific function of the peri-kappa B factor may ultimately provide insight into the different role monocytes and T-cells play in HIV pathogenesis. 

### This is a sample. Replace this with your own text.
        """
data = {
        "paragraph" : text 
       }

encoded_data = urllib.urlencode(data)
content = urllib2.urlopen("http://text0.mib.man.ac.uk/software/geniatagger/a.cgi",
        encoded_data)
print content.readlines()

And what do we get so far? We got an "engine" for your GUI program. What you can do is parse this content variable with python's HTMLParser (optional) And you mentioned that you want to display this in GUI? You can do this using GTK or Qt and map this functionality to a single button, you must read a tutorial , it's really easy for this purpose. If you have problems just comment this post and I can extend this answer with GUI

like image 148
Jan Vorcak Avatar answered Nov 30 '25 12:11

Jan Vorcak



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!