I'm coding a Python parser for a website to do some job automatically but I'm not much into "re" module (regex) for Py and can't make it work.
req = urllib2.Request(tl2)
req.add_unredirected_header('User-Agent', ua)
response = urllib2.urlopen(req)
try:
html = response.read()
except urllib2.URLError, e:
print "Error while reading data. Are you connected to the interwebz?!", e
soup = BeautifulSoup.BeautifulSoup(html)
form = soup.find('form', id='form_product_page')
pret = form.prettify()
print pret
Result:
<form id="form_product_page" name="form_1362737440" action="/download/791055/164084/" method="get">
<input id="nojssubmit" type="submit" value="Download" />
</form>
Indeed that code is done, just what I need for start. Now, I'm wondering on which way should I extract "action" attribute from "form" tag. That is only what I need from BeautifulSoup response.
I've tried using form = soup.find('form', id='form_product_page').parent.get('action')
but result was 'None'. What I want to extract is for example "/download/791055/164084/". This is different on every URL from link.
find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.
Basically, the BeautifulSoup 's text attribute will return a string stripped of any HTML tags and metadata.
You can do it in one step:
action = soup.find('form', id='form_product_page').get('action')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With