Calling urrlib2.urlopen
on a link to an article fetched from an RSS feed leads to the following error:
urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error tha t would lead to an infinite loop. The last 30x error message was: Moved Permanently
According to the documentation, urllib2 supports redirects.
On Java the problem was solved by just calling
HttpURLConnection.setFollowRedirects(true);
How can I solve it with Python?
UPDATE
The link I'm having problems with:
http://feeds.nytimes.com/click.phdo?i=8cd5af579b320b0bfd695ddcc344d96c
Turns out you need to enable Cookies. The page redirects to itself after setting a cookie first. Because urllib2 does not handle cookies by default you have to do it yourself.
import urllib2
import urllib
from cookielib import CookieJar
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
p = opener.open("http://feeds.nytimes.com/click.phdo?i=8cd5af579b320b0bfd695ddcc344d96c")
print p.read()
Nothing wrong with @sleeplessnerd's solution, but this is very, very slightly more elegant:
import urllib2
url = "http://stackoverflow.com/questions/9926023/handling-rss-redirects-with-python-urllib2"
p = urllib2.build_opener(urllib2.HTTPCookieProcessor).open(url)
print p.read()
In fact, if you look at the inline documentation for the CookieJar()
function, it more-or-less tells you to do things this way:
You may not need to know about this class: try urllib2.build_opener(HTTPCookieProcessor).open(url)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With