Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining reddit data [closed]

Tags:

I am interested in obtaining data from different reddit subreddits. Does anyone know if there is a reddit/other api similar like twitter does to crawl all the pages?

like image 291
Budhapest Avatar asked Jan 14 '13 16:01

Budhapest


People also ask

How do I recover Reddit data?

To request a copy of your Reddit data and information, fill out a data request form by following these steps: Visit https://www.reddit.com/settings/data-request on your computer's web browser. Log in to the Reddit account you'd like to request data from. Follow the instructions and click Submit.

Can I scrape data from Reddit?

Scrape data from Reddit using PRAW, the Python wrapper for the Reddit API. As its name suggests, PRAW is a Python wrapper for the Reddit API, which enables you to scrape data from subreddits, create a bot, and much more.

Does Reddit store your data?

Reddit collects your device information, usage data, and location. Activity from your browser is collected in the form of cookies and used to “improve your experience.” If you have linked accounts, your account information is shared with Reddit.


2 Answers

Yes, reddit has an API that can be used for a variety of purposes such as data collection, automatic commenting bots, or even to assist in subreddit moderation.

There are a few places to discover information on reddit's API:

  • github reddit wiki -- provides the overview and rules for using reddit's API (follow the rules)
  • automatically generated API docs -- provides information on the requests needed to access most of the API endpoints
  • /r/redditdev -- the reddit community dedicated to answering questions both about reddit's source code and about reddit's API

If there is a particular programming language you are already familiar with, you should check out the existing set of API wrappers for various languages. Despite my bias (I am the package maintainer) I am quite certain PRAW, for python, has support for the largest number of reddit API features.

like image 74
bboe Avatar answered Nov 10 '22 00:11

bboe


Note that if you are only reading data, and not interested into posting back to reddit, you can get quite a bit of data from the json feeds associated with each subreddit. With this method, you don't need to worry about an API at all -- you simply request the relevant json file and parse it in your language of choice.

Here's an example URL that will return a json object containing the hot posts from the Justrolledintotheshop subreddit: https://www.reddit.com/r/Justrolledintotheshop/top.json

In place of top, you can use hot, new, or controversial. When using top, you can add ?t=day to the end of the url to specify the top post for the day. Other valid values are hour, day, week, month, year, or all.

like image 39
Haydentech Avatar answered Nov 09 '22 23:11

Haydentech