Avoid Rate limit with rtweet get_timeline()

Question

Is there anyway to stop my loop from being interrupted by the rate limit? I would like my code to wait to execute until the time limit has passed if possible.

A side question: I thought about parallelizing the for loop. Do you think this would be a good idea? I was not sure if there would be a chance for data to be written to the wrong file.

library(rtweet)
create_token(app="Arconic Influential Followers",consumer_key,consumer_secret) 

flw <- get_followers("arconic")
fds <- get_friends("arconic")
usrs <- lookup_users(c(flw$user_id, fds$user_id))

for(i in 1:length(usrs$user_id)){

    a<-tryCatch({get_timeline(usrs$user_id[i])},
                error=function(e){message(e)}
       )
    tryCatch({save_as_csv(a,usrs$user_id[i])},
                error=function(e){message(e)}
       )

}

Sasha · Accepted Answer

I was able to resolve it by wrapping get_timeline() function in the following code. The function get_timeline_unlimited calls itself recursively after waiting the required time for the rate limit to reset. So far it worked well for me with no issues.

 get_timeline_unlimited <- function(users, n){

  if (length(users) ==0){
    return(NULL)
  }

  rl <- rate_limit(query = "get_timeline")

  if (length(users) <= rl$remaining){
    print(glue("Getting data for {length(users)} users"))
    tweets <- get_timeline(users, n, check = FALSE)  
  }else{

    if (rl$remaining > 0){
      users_first <- users[1:rl$remaining]
      users_rest <- users[-(1:rl$remaining)]
      print(glue("Getting data for {length(users_first)} users"))
      tweets_first <- get_timeline(users_first, n, check = FALSE)
      rl <- rate_limit(query = "get_timeline")
    }else{
      tweets_first <- NULL
      users_rest <- users
    }
    wait <- rl$reset + 0.1
    print(glue("Waiting for {round(wait,2)} minutes"))
    Sys.sleep(wait * 60)

    tweets_rest <- get_timeline_unlimited(users_rest, n)  
    tweets <- bind_rows(tweets_first, tweets_rest)
  }
  return(tweets)
}

Brent Ferrier · Answer

What I ended up doing was create a while loop that checked the number of records I had left in my Users vector, ran my for loop, and then put the system to sleep for 15 mins. This approach is good, but there are some things to account for. I have the while loop breaking at 200 just in case there were users that didn't have any data to save into a csv. This turned out to be a good move because if you notice the for loop starts iterating at 80. As you start moving across your vector of users the good users are removed iteratively. This leaves only the users that cause errors. An improvement for someone up to the task would be to handle this programatically.

Users <- usrs$user_id
goodUsers <- substring(list.files(),1,nchar(list.files())-11)
Users <- setdiff(Users,goodUsers)

while(length(Users)>200){
    for(i in 80:length(Users)){

        a<-tryCatch({get_timeline(Users[i],usr=FALSE)},
                error=function(e){message(e)}
           )
        tryCatch({save_as_csv(a,Users[i])
                goodUsers <- append(goodUsers,Users[i])},
                    error=function(e){message(e)}
        )


    }
Users <- setdiff(Users,goodUsers)
Sys.sleep(900)
}

length(Users)
length(goodUsers)

Avoid Rate limit with rtweet get_timeline()

Tags:

r

parallel-processing

twitter

Brent Ferrier

2 Answers

Sasha

Brent Ferrier

Recent Activity

Donate For Us

Avoid Rate limit with rtweet get_timeline()

Tags:

r

parallel-processing

twitter

Brent Ferrier

2 Answers

Sasha

Brent Ferrier

Related questions

Recent Activity

Donate For Us