Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Frequently receiving 503 error when conducting Reddit search with PRAW

I'm using PRAW to view a large number of Reddit search results (both submissions and comments), and the method I'm using to collect the data is frequently generating a 503 error:

prawcore.exceptions.ServerError: received 503 HTTP response

As I understand it, if it were a rate limit issue, PRAW would throw a praw.errors.RateLimitExceeded error.

The function in which the error is produced, is the following:

def search_subreddit(subreddit_name, last_post=None):
    params = {'sort': 'new', 'time_filter': 'year', 
                      'limit': 100, 'syntax':'cloudsearch'}

    if last_post:
        start_time = 0 
        end_time = int(last_post.created) + 1
        query = 'timestamp:%s..%s' % (start_time, end_time)
    else: 
        query = ''

    return reddit.subreddit(subreddit_name).search(query, **params)

That's being called within a loop. Any idea as to why the 503 error is being generated, and how to prevent it from happening?

like image 591
Dreadnaught Avatar asked Dec 18 '22 09:12

Dreadnaught


1 Answers

Why it is being generated?

503 is HTTP protocol code reserved for informing that server is temporarily unavailable. In almost all cases it means that it doesn't have resources at the moment of request to generate response due to overload.

How to prevent it from happening?

Since this is server-side issue and I'll assume here that you are not a part of reddit networking team you cannot do anything directly to fix that. I'll try to list your possible options here

  • Complain on social media that reddit servers suck (Probably ineffective)
  • Try to reach reddit networking unit and inform them about the issue (Still ineffective, but might do good in long term)
  • Suggest feature to PRAW - keyword repeat_in_case_of_server_overload and repeat_in_case_of_server_overload_timeout, which when first is set to True (default False) would try to repeat requests for some customizable amount of time. (It would be interesting to see, but unlikely to get accepted in this form, also it would take some time to process)
  • Modify PRAW to do a thing described above yourself and then add pull request in github. (You would have it immiedietely, but still might not get accepted and requires a bit of work)
  • You could try to run your script when reddit servers are less busy (That honestly might work if you run it manually and only need data occasionally)
  • Add simple mechanism that will try to get search results multiple times until till succeedss (Yeah, this is probably reccomended one)

Something like:

result = None
last_exception = None
timeout = 900 #seconds = 15 minutes
time_start = int(time.time())
while not result and int(time.time()) < time_start + timeout:
    try:
        result = reddit.subreddit(subreddit_name).search(query, **params)
    except prawcore.exceptions.ServerError as e:
        #wait for 30 seconds since sending more requests to overloaded server might not be helping
        last_exception = e
        time.sleep(30)
if not result:
    raise last_exception
return result

Also code above is more of pseudocode, since i haven't tested it in any way and it possibly won't even work verbatim, but hopefully will convey the idea clearly.

like image 143
Tomasz Plaskota Avatar answered May 13 '23 15:05

Tomasz Plaskota