Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

I receive a 'HTTP Error 500: Internal Server Error' response, but I still want to read the data inside the error HTML.

With Python 2.6, I normally fetch a page using:

import urllib2 url = "http://google.com" data = urllib2.urlopen(url) data = data.read() 

When attempting to use this on the failing URL, I get the exception urllib2.HTTPError:

urllib2.HTTPError: HTTP Error 500: Internal Server Error 

How can I fetch such error pages (with or without urllib2), all while they are returning Internal Server Errors?

Note that with Python 3, the corresponding exception is urllib.error.HTTPError.

like image 247
backus Avatar asked Feb 10 '10 00:02

backus


2 Answers

The HTTPError is a file-like object. You can catch it and then read its contents.

try:     resp = urllib2.urlopen(url)     contents = resp.read() except urllib2.HTTPError, error:     contents = error.read() 
like image 53
Joe Holloway Avatar answered Sep 20 '22 04:09

Joe Holloway


If you mean you want to read the body of the 500:

request = urllib2.Request(url, data, headers) try:         resp = urllib2.urlopen(request)         print resp.read() except urllib2.HTTPError, error:         print "ERROR: ", error.read() 

In your case, you don't need to build up the request. Just do

try:         resp = urllib2.urlopen(url)         print resp.read() except urllib2.HTTPError, error:         print "ERROR: ", error.read() 

so, you don't override urllib2.HTTPError, you just handle the exception.

like image 41
sberry Avatar answered Sep 22 '22 04:09

sberry