Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to scrape a huge amounts of tweets

I am building a project in python that needs to scrape huge and huge amounts of Twitter data. Something like 1 million users and all their tweets need to be scraped.

Previously I have used Tweepy and Twython, but hit the limit of Twitter very fast.

How do sentiment analysis companies etc. get their data? How do they get all those tweets? Do you buy this somewhere or build something that iterates through different proxies or something?

How do companies like Infochimps with for example Trst rank get all their data? * http://www.infochimps.com/datasets/twitter-census-trst-rank

like image 557
Javaaaa Avatar asked Dec 09 '22 05:12

Javaaaa


1 Answers

If you want the latest tweets from specific users, Twitter offers the Streaming API.

The Streaming API is the real-time sample of the Twitter Firehose. This API is for those developers with data intensive needs. If you're looking to build a data mining product or are interested in analytics research, the Streaming API is most suited for such things.

If you're trying to access old information, the REST API with its severe request limits is the only way to go.

like image 169
Cody Hess Avatar answered Dec 25 '22 18:12

Cody Hess