urllib2 HTTP error 429

Tags:

So I have a list of sub-reddits and I'm using urllib to open them. As I go through them eventually urllib fails with:

urllib2.HTTPError: HTTP Error 429: Unknown

Doing some research I found that reddit limits the ammount of requests to their servers by IP:

Make no more than one request every two seconds. There's some allowance for bursts of requests, but keep it sane. In general, keep it to no more than 30 requests in a minute.

So I figured I'd use time.sleep() to limit my requests to one page each 10 seconds. This ends up failing just as well.

The quote above is grabbed from the reddit API page. I am not using the reddit API. At this point I'm thinking two things. Either that limit applies only to the reddit API or urllib also has a limit.

Does anyone know which one of these two things it is? Or how I could go around this issue?

886

asked Nov 03 '12 20:11

Florin Stingaciu

2 Answers

From https://github.com/reddit/reddit/wiki/API:

Many default User-Agents (like "Python/urllib" or "Java") are drastically limited to encourage unique and descriptive user-agent strings.

This applies to regular requests as well. You need to supply your own user agent header when making the request.

#TODO: change user agent string
hdr = { 'User-Agent' : 'super happy flair bot by /u/spladug' }
req = urllib2.Request(url, headers=hdr)
html = urllib2.urlopen(req).read()

However, this will create a new connection for every request. I suggest using another library that is capable of re-using connections, httplib or Request, for example. It will put less stress on the server and speed up the requests:

import httplib
import time

lst = """
science
scifi
"""

hdr= { 'User-Agent' : 'super happy flair bot by /u/spladug' }
conn = httplib.HTTPConnection('www.reddit.com')
for name in lst.split():
    conn.request('GET', '/r/'+name, headers=hdr)
    print conn.getresponse().read()
    time.sleep(2)
conn.close()

answered Oct 07 '22 08:10

Anonymous Coward

reddit performs rate limiting by request (not connection as suggested by Anonymous Coward) for both IP addresses and user agents. The issue you are running into is that everyone who attempts to access reddit using urllib2 will be rate limited as a single user.

The solution is to set a user agent which you can find an answer in this question.

Alternatively, forgo writing your own code to crawl reddit and use PRAW instead. It supports almost all the features of reddit's API and you needn't worry about following any of the API rules as it takes care of that for you.

answered Oct 07 '22 08:10

bboe

Related questions
                            
                                Why is only one Flask teardown_request function being called when view raises Exception?
                            
                                Unable to deserialize PyMongo ObjectId from JSON
                            
                                BeautifulSoup: Strip specified attributes, but preserve the tag and its contents
                            
                                Get longest element in Dict
                            
                                Python Formatting Large Text
                            
                                Opening pdf urls with pyPdf
                            
                                How to change variables fed into a for loop in list form
                            
                                How can I communicate between a Siemens S7-1200 and python?
                            
                                Why can't I end a raw string with a backslash? [duplicate]
                            
                                Why does zip() drop the values of my generator?
                            
                                Tkinter askquestion dialog box
                            
                                zen of Python vs with statement - philosophical pondering
                            
                                Recursive generator for flattening nested lists
                            
                                How to find the list in a list of lists whose sum of elements is the greatest?
                            
                                Django error in Heroku: "Please supply the ENGINE value"
                            
                                Saving dictionary whose keys are tuples with json, python
                            
                                Python - test whether object is a builtin function
                            
                                Compile Python 2.7.3 from source on a system with Python 2.7 already
                            
                                How do I compute all possibilities for an array of numbers/bits (in python, or any language for that matter)
                            
                                Multiprocessing scikit-learn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

urllib2 HTTP error 429

Tags:

python

urllib2

http-status-code-429

reddit

Florin Stingaciu

People also ask

2 Answers

Anonymous Coward

bboe

Recent Activity

Donate For Us