Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems with non UTF-8 and ASCII characters twitteR package in R

Tags:

r

utf-8

twitter

In a previous question I asked about downloading a large number of Twitter followers (and their location, date of creation, number of followers, etc.) from the Haaretz Twitter feed (@haaretzcom) using the twitteR package in R (see Work around rate limit for extracting large list of user information using twitteR package in R ). The Twitter feed has over 90,000 followers I was able to download the full list of followers no problem using the code below.

   require(twitteR)
   require(ROAuth)
   #Loading the Twitter OAuthorization
   load("~/Dropbox/Twitter/my_oauth")

   #Confirming the OAuth
   registerTwitterOAuth(my_oauth)

  # opening list to download
  haaretz_followers<-getUser("haaretzcom")$getFollowerIDs(retryOnRateLimit=9999999)

  for (follower in haaretz_followers){
   Sys.sleep(5)
   haaretz_followers_info<-lookupUsers(haaretz_followers)

   haaretz_followers_full<-twListToDF(haaretz_followers_info)

   #Export data to csv
  write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv",  sep=",")
 }

The code works in extracting many of the users. However, whenever I hit a certain user I get the following error:

Error in twFromJSON(out) :
RMate stopped at line 51
Error: Malformed response from server, was not JSON.
RMate stopped at line 51
The most likely cause of this error is Twitter returning a character which
can't be properly parsed by R. Generally the only remedy is to wait long
enough for the offending character to disappear from searches (e.g. if
using searchTwitter()).
Calls: twListToDF ... lookupUsers -> lapply -> FUN -> <Anonymous> -> twFromJSON
Execution halted

Even if I load the RJSONIO package after twitteR package, I am running into this problem. From doing some research, it appears that the twitteR and RJSONIO package have problems parsing the non-UTF-8 or ASCII characters (Arabic etc.) http://lists.hexdump.org/pipermail/twitter-users-hexdump.org/2013-May/000335.html. Is there a way to simply ignore non UTF-8 or ASCII in the code I have and still extract all the follower information? Any help would be much appreciated.

like image 943
Thomas Avatar asked Nov 13 '22 05:11

Thomas


1 Answers

There's an package update (1.1.7), addressing this issue. see: https://github.com/geoffjentry/twitteR/blob/master/NEWS

like image 162
SPi Avatar answered Nov 15 '22 07:11

SPi