Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python beautiful soup form input parsing

My goal is to grab a list of all input names and values. To pair them up and submit the form. The names and values are randomised.

from bs4 import BeautifulSoup # parsing

html = """
<html>
<head id="Head1"><title>Title Page</title></head>
<body>
    <form id="formS" action="login.asp?dx=" method="post">

    <input type=hidden name=qw1NWJOJi/E8IyqHSHA== value='gDcZHY+nV' >
    <input type=hidden name=sfqwWJOJi/E8DFDHSHB== value='kgDcZHY+n' >
    <input type=hidden name=Jsfqw1NdddfDDSDKKSL== value='rNg4pUhnV' >
    </form>

</body>

</html>
"""

html_proc = BeautifulSoup(html)

This bit works fine:

print html_proc.find("input", value=True)["value"]
> gDcZHY+nV

However the following statements don't work or don't work as hoped:

print html_proc.find("input", name=True)["name"]
> TypeError: find() got multiple values for keyword argument 'name'

print html_proc.findAll("input", value=True, attrs={'value'})
> []  

print html_proc.findAll('input', value=True)
> <input name="qw1NWJOJi/E8IyqHSHA==" type="hidden" value="gDcZHY+nV">
> <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" value="kgDcZHY+n">
> <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV">
> </input></input></input>, <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" 
> value="kgDcZHY+n">
> <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV">
> </input></input>, <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4p
> UhnV"></input>
like image 950
sarasimple Avatar asked Apr 11 '14 00:04

sarasimple


2 Answers

You cannot submit a form with BeautifulSoup, but here's how you can get the list of name,value pairs:

print [(element['name'], element['value']) for element in html_proc.find_all('input')]

prints:

[('qw1NWJOJi/E8IyqHSHA==', 'gDcZHY+nV'), 
 ('sfqwWJOJi/E8DFDHSHB==', 'kgDcZHY+n'), 
 ('Jsfqw1NdddfDDSDKKSL==', 'rNg4pUhnV')]
like image 165
alecxe Avatar answered Nov 13 '22 05:11

alecxe


d = {e['name']: e.get('value', '') for e in html_proc.find_all('input', {'name': True})}
print(d)

prints:

{'sfqwWJOJi/E8DFDHSHB==': 'kgDcZHY+n', 
 'qw1NWJOJi/E8IyqHSHA==': 'gDcZHY+nV', 
 'Jsfqw1NdddfDDSDKKSL==': 'rNg4pUhnV'}

Building on @alecxe, this avoids KeyErrors, and parses the form into a dictionary, more ready for requests.

url = 'http://example.com/' + html_proc.form['action']
requests.post(url , data=d)

Though if this gets any more complicated (cookies, scripts) you might want to Mechanize.


The reason for the TypeError is confusion over the first parameter to find() being 'name'. Instead html_proc.find("input", attrs={'name': True}). Also for the attrs parameter, instead of the set {'value'} use the dictionary {'value': True}.

like image 33
Bob Stein Avatar answered Nov 13 '22 05:11

Bob Stein