My goal is to grab a list of all input names and values. To pair them up and submit the form. The names and values are randomised.
from bs4 import BeautifulSoup # parsing
html = """
<html>
<head id="Head1"><title>Title Page</title></head>
<body>
<form id="formS" action="login.asp?dx=" method="post">
<input type=hidden name=qw1NWJOJi/E8IyqHSHA== value='gDcZHY+nV' >
<input type=hidden name=sfqwWJOJi/E8DFDHSHB== value='kgDcZHY+n' >
<input type=hidden name=Jsfqw1NdddfDDSDKKSL== value='rNg4pUhnV' >
</form>
</body>
</html>
"""
html_proc = BeautifulSoup(html)
This bit works fine:
print html_proc.find("input", value=True)["value"]
> gDcZHY+nV
However the following statements don't work or don't work as hoped:
print html_proc.find("input", name=True)["name"]
> TypeError: find() got multiple values for keyword argument 'name'
print html_proc.findAll("input", value=True, attrs={'value'})
> []
print html_proc.findAll('input', value=True)
> <input name="qw1NWJOJi/E8IyqHSHA==" type="hidden" value="gDcZHY+nV">
> <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden" value="kgDcZHY+n">
> <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV">
> </input></input></input>, <input name="sfqwWJOJi/E8DFDHSHB==" type="hidden"
> value="kgDcZHY+n">
> <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4pUhnV">
> </input></input>, <input name="Jsfqw1NdddfDDSDKKSL==" type="hidden" value="rNg4p
> UhnV"></input>
You cannot submit a form with BeautifulSoup
, but here's how you can get the list of name,value pairs:
print [(element['name'], element['value']) for element in html_proc.find_all('input')]
prints:
[('qw1NWJOJi/E8IyqHSHA==', 'gDcZHY+nV'),
('sfqwWJOJi/E8DFDHSHB==', 'kgDcZHY+n'),
('Jsfqw1NdddfDDSDKKSL==', 'rNg4pUhnV')]
d = {e['name']: e.get('value', '') for e in html_proc.find_all('input', {'name': True})}
print(d)
prints:
{'sfqwWJOJi/E8DFDHSHB==': 'kgDcZHY+n',
'qw1NWJOJi/E8IyqHSHA==': 'gDcZHY+nV',
'Jsfqw1NdddfDDSDKKSL==': 'rNg4pUhnV'}
Building on @alecxe, this avoids KeyErrors, and parses the form into a dictionary, more ready for requests.
url = 'http://example.com/' + html_proc.form['action']
requests.post(url , data=d)
Though if this gets any more complicated (cookies, scripts) you might want to Mechanize.
The reason for the TypeError is confusion over the first parameter to find() being 'name'. Instead html_proc.find("input", attrs={'name': True})
. Also for the attrs parameter, instead of the set {'value'} use the dictionary {'value': True}
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With