Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reddit search API not giving all results

import praw

def get_data_reddit(search):
    username=""
    password=""
    r = praw.Reddit(user_agent='')
    r.login(username,password,disable_warning=True)
    posts=r.search(search, subreddit=None,sort=None, syntax=None,period=None,limit=None)
    title=[]
    for post in posts:
        title.append(post.title)
    print len(title)


search="stackoverflow"
get_data_reddit(search)
        

Ouput=953

Why the limitation?

  1. [Documentation][1] mentions

We can at most get 1000 results from every listing, this is an upstream limitation by reddit. There is nothing we can do to go past this limit. But we may be able to get the results we want with the search() method instead.

Any workaround? I hoping someway to overcome in API, I wrote an scraper for twitter data and find it to be not the most efficient solution.

Same Question:https://github.com/praw-dev/praw/issues/430 [1]: https://praw.readthedocs.org/en/v2.0.15/pages/faq.html Please refer the aformentioned link for related discussion too.

like image 739
Abhishek Bhatia Avatar asked Jun 23 '15 10:06

Abhishek Bhatia


1 Answers

Limiting results on a search or list is a common tactic for reducing load on servers. The reddit API is clear that this is what it does (as you have already flagged). However it doesn't stop there...

The API also supports a variation of paged results for listings. Since it is a constantly changing database, they don't provide pages, but instead allow you to pick up where you left off by using the 'after' parameter. This is documented here.

Now, while I'm not familiar with PRAW, I see that the reddit search API conforms to the listing syntax. I think you therefore only need to reissue your search, specifying the extra 'after' parameter (referring to your last result from the first search).

Having subsequently tried it out, it appears PRAW is genuinely returning you all the results you asked for.

As requested by OP, here's the code I wrote to look at the paged results.

import praw

def get_data_reddit(search, after=None):
    r = praw.Reddit(user_agent='StackOverflow example')
    params = {"q": search}
    if after:
        params["after"] = "t3_" + str(after.id)
    posts = r.get_content(r.config['search'] % 'all', params=params, limit=100)
    return posts

search = "stackoverflow"
post = None
count = 0
while True:
    posts = get_data_reddit(search, post)
    for post in posts:
        print(str(post.id))
        count += 1
    print(count)
like image 191
Peter Brittain Avatar answered Nov 14 '22 02:11

Peter Brittain