I have a function to scrape all the posts in the Bitcoin subreddit between 2014-11-01 and 2015-10-31.
However, I'm only able to extract about 990 posts that go back only to October 25. I don't understand what's happening. I included a Sys.sleep of 15 seconds between each extract after referring to https://github.com/reddit/reddit/wiki/API, to no avail.
Also, I experimented with scraping from another subreddit (fitness), but it also returned around 900 posts.
require(jsonlite)
require(dplyr)
getAllPosts <- function() {
url <- "https://www.reddit.com/r/bitcoin/search.json?q=timestamp%3A1414800000..1446335999&sort=new&restrict_sr=on&rank=title&syntax=cloudsearch&limit=100"
extract <- fromJSON(url)
posts <- extract$data$children$data %>% dplyr::select(name, author, num_comments, created_utc,
title, selftext)
after <- posts[nrow(posts),1]
url.next <- paste0("https://www.reddit.com/r/bitcoin/search.json?q=timestamp%3A1414800000..1446335999&sort=new&restrict_sr=on&rank=title&syntax=cloudsearch&after=",after,"&limit=100")
extract.next <- fromJSON(url.next)
posts.next <- extract.next$data$children$data
# execute while loop as long as there are any rows in the data frame
while (!is.null(nrow(posts.next))) {
posts.next <- posts.next %>% dplyr::select(name, author, num_comments, created_utc,
title, selftext)
posts <- rbind(posts, posts.next)
after <- posts[nrow(posts),1]
url.next <- paste0("https://www.reddit.com/r/bitcoin/search.json?q=timestamp%3A1414800000..1446335999&sort=new&restrict_sr=on&rank=title&syntax=cloudsearch&after=",after,"&limit=100")
Sys.sleep(15)
extract <- fromJSON(url.next)
posts.next <- extract$data$children$data
}
posts$created_utc <- as.POSIXct(posts$created_utc, origin="1970-01-01")
return(posts)
}
posts <- getAllPosts()
Does reddit have some kind of limit that I'm hitting?
Yes, all reddit listings (posts, comments, etc.) are capped at 1000 items; they're essentially just cached lists, rather than queries, for performance reasons.
To get around this, you'll need to do some clever searching based on timestamps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With