Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting form "action" from BeautifulSoup result

I'm coding a Python parser for a website to do some job automatically but I'm not much into "re" module (regex) for Py and can't make it work.

req = urllib2.Request(tl2)
req.add_unredirected_header('User-Agent', ua)
response = urllib2.urlopen(req)
try:
    html = response.read()
except urllib2.URLError, e:
    print "Error while reading data. Are you connected to the interwebz?!", e

soup = BeautifulSoup.BeautifulSoup(html)
form = soup.find('form', id='form_product_page')
pret = form.prettify()

print pret

Result:

<form id="form_product_page" name="form_1362737440" action="/download/791055/164084/" method="get">
<input id="nojssubmit" type="submit" value="Download" />
</form>

Indeed that code is done, just what I need for start. Now, I'm wondering on which way should I extract "action" attribute from "form" tag. That is only what I need from BeautifulSoup response.

I've tried using form = soup.find('form', id='form_product_page').parent.get('action') but result was 'None'. What I want to extract is for example "/download/791055/164084/". This is different on every URL from link.


Variables (example):
tl2 = http://example.com
ua = Mozilla Firefox / 14.04
like image 625
sensation Avatar asked May 04 '14 23:05

sensation


People also ask

What is Find () method in BeautifulSoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.

What does BeautifulSoup return?

Basically, the BeautifulSoup 's text attribute will return a string stripped of any HTML tags and metadata.


1 Answers

You can do it in one step:

action = soup.find('form', id='form_product_page').get('action')
like image 179
Casimir et Hippolyte Avatar answered Oct 18 '22 03:10

Casimir et Hippolyte