I want to scrape this web page using Mechanize. The form element looks like this:
<form name="ctl00" method="post" action="PSearchResults.aspx?state=ME&rp=" id="ctl00">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="verylongstring" /> </div>
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAgKb7POZAwK4v7ffCOmari00yJft/iuZBMdOH/zh9TDI" />
</div>
</form>
I'm using Mechanize to print out the controls, but it can only see two of them. If I run this:
br.select_form(name='ctl00')
br.form.set_all_readonly(False) # allow changing the .value of all controls
for control in br.form.controls:
if not control.name:
print " - (type) =", (control.type)
continue
print " - (name, type, value) =", (control.name, control.type, br[control.name])
all that gets printed is this:
- (name, type, value) = ('__VIEWSTATE', 'hidden', '/wEPDwUGNDQ5NTMwD2QWAgIBD2QWAgIHD2QWCgIBDw8WAh4E...more
- (name, type, value) = ('__EVENTVALIDATION', 'hidden', '/wEWAgKb7POZAwK4v7ffCOmari00yJft/iuZBMdOH/zh9TDI')
Why can't Mechanize 'see' the __EVENTTARGET and __EVENTARGUMENT fields?
The site is checking the useragent and serving a different page to mechanize
specifying this as the useragent seems to work ok
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Here is a link showing how to set the User-Agent with mechanize
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With