Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PRAW 6: Get all submission of a subreddit

I'm trying to iterate over submissions of a certain subreddit from the newest to the oldest using PRAW. I used to do it like this:

subreddit = reddit.subreddit('LandscapePhotography')
for submission in subreddit.submissions(None, time.time()):
    print("Submission Title: {}".format(submission.title))

However, when I try to do it now I get the following error:

AttributeError: 'Subreddit' object has no attribute 'submissions'

From looking at the docs I can't seem to figure out how to do this. The best I can do is:

for submission in subreddit.new(limit=None):
    print("Submission Title: {}".format(submission.title))

However, this is limited to the first 1000 submissions only.

Is there a way to do this with all submissions and not just the first 1000 ?

like image 683
Curtwagner1984 Avatar asked Dec 31 '18 14:12

Curtwagner1984


People also ask

What is a Reddit object PRAW?

PRAW: The Python Reddit Api Wrapper PRAW, an acronym for “Python Reddit API Wrapper”, is a python package that allows for simple access to reddit's API. PRAW aims to be as easy to use as possible and is designed to follow all of reddit's API rules.

How do you gather data on Reddit?

Some of your information is available via the Reddit mobile app, however, it's easiest to find what you're looking for by visiting reddit.com on your computer's web browser and logging in to your account. To request a copy of your Reddit data and information, fill out a data request form.

What is Pushshift API?

The pushshift.io Reddit API was designed and created by the /r/datasets mod team to help provide enhanced functionality and search capabilities for searching Reddit comments and submissions.

What is user agent PRAW?

User Agent. A user agent is a unique identifier that helps Reddit determine the source of network requests. To use Reddit's API, you need a unique and descriptive user agent. The recommended format is <platform>:<app ID>:<version string> (by u/<Reddit username>) .


2 Answers

Unfortunately, Reddit removed this function from their API.

Check out the PRAW changelog. One of the changes in version 6.0.0 is:

Removed

  • Subreddit.submissions as the API endpoint backing the method is no more. See https://www.reddit.com/r/changelog/comments/7tus5f/update_to_search_api/.

The linked post says that Reddit is disabling Cloudsearch for all users:

Starting March 15, 2018 we’ll begin to gradually move API users over to the new search system. By end of March we expect to have moved everyone off and finally turn down the old system.

PRAW's Subreddit.sumbissions() used Cloudsearch to search for posts between the given timestamps. Since Cloudsearch has been removed and the search that replaced it doesn't support timestamp search, it is no longer possible to perform a search based on timestamp with PRAW or any other Reddit API client. This includes trying to get all posts from a subreddit.

For more information, see this thread from /r/redditdev posted by the maintainer of PRAW.


Alternatives

Since Reddit limits all listings to ~1000 entries, it is currently impossible to get all posts in a subreddit using their API. However, third-party datasets with APIs exist, such as pushshift.io. As /u/kungming2 said on Reddit:

You can use Pushshift.io to still return data from defined time periods by using their API:

https://api.pushshift.io/reddit/submission/search/?after=1334426439&before=1339696839&sort_type=score&sort=desc&subreddit=translator

This, for example, allows you to parse submissions to r/translator between 2012-04-14 and 2012-06-2014.

like image 186
jarhill0 Avatar answered Sep 24 '22 09:09

jarhill0


You can retrieve all the data from pushshift.io using an iterative loop. Just set the start date as the current epoch date, and get 1000 items, then put the created_utc of the last items in the list as the before parameter to get the next 1000 items and keeps going until it stops returning.

Below is a useful link for further information: https://www.reddit.com/r/pushshift/comments/b7onr6/max_number_of_results_returned_per_query/enter link description here

like image 22
Nez Avatar answered Sep 22 '22 09:09

Nez