I am writing a function for some existing python code that will be passed a Mechanize browser object as a parameter.
I fill in some details in a form in the browser, and use response = browser.submit()
to move the browser to a new page, and collect some information from it.
Unfortunately, I occasionally get the following error:
httperror_seek_wrapper: HTTP Error 500: Internal Server Error
I've navigated to the page in my own browser, and sure enough, I occasionally see this error directly, so I think this is a server problem, not anything to do with robots.txt
, headers or similar.
The problem is that after submitting, the state of the browser
object changes and I can't continue to use it. My first thought was to try taking a deep copy first and use that if I ran into problems, but that gives the error TypeError: object.__new__(cStringIO.StringO) is not safe, use cStringIO.StringO.__new__()
as described here.
I've also tried using browser.back()
but get NoneType
errors.
Does anyone have a good solution to this?
A great solution below uses the excellent requests
library (docs here). requests
has functionality to fill in a form and submit via post
or get
, which importantly doesn't change the state of the br
object.
An excellent website allows us to test various error codes, and has a form interface at the top that I've tested this on. I create a br
object at this site, then define a function that selects the form from br
, pulls out the relevant information, but does the submit via requests
- so that the br
object hasn't changed and is re-usable. Error codes cause requests
to return rubbish, but don't render the br
unusable.
As stated below, this involves a little more setup time, but is well worth it.
import mechanize
import requests
def testErrorCodes(br,theCodes):
for x in theCodes:
br.select_form(nr=0)
theAction = br.action
payload = {'code': x}
response = requests.post(theAction, data=payload)
print response.status_code
br=mechanize.Browser()
br.set_handle_robots(False)
response = br.open("http://savanttools.com/test-http-status-codes")
testErrorCodes(br,[401,402,403,404,500,503,504]) # Prints the error codes
testErrorCodes(br,[404]) # The browser is still alive and well to be used again!
It's been a while since I've written for python, but I think I have a workaround for your problem. Try this method:
import requests
except Mechanize.HTTPError:
while true: ## DANGER ##
## You will need to format and/or decode the POST for your form
response = requests.post('http://yourwebsite.com/formlink', data=None, json=None)
## If the server will accept JSON formatting, this becomes trivial
if response.status_code == accepted_code: break
You can find documentation about the requests
library here. I personally think that requests
is better for your case than mechanize
... but it does require a little more overhead from you in that you need to break down the submission to raw POST using some kind of RESTful interceptor in your browser.
Ultimately though, by passing in br
you are restricting yourself to the way that mechanize handles browser states on br.submit()
.
I'm assuming that you want the submission to happen even if it takes multiple tries.
The solution that I thought of is certainly not efficient, but it should work.
def do_something_in_mechanize():
<...insert your code here...>
try:
browser.submit()
<...rest of your code...>
except mechanize.HTTPError:
do_something_in_mechanize()
Basically, it'll call the function until the action is performed without HTTPError
s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With