Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter Twitter feeds only by language

Tags:

I am using Tweepy API for extracting Twitter feeds. I want to extract all Twitter feeds of a specific language only. The language filter works only if track filter is provided. The following code returns 406 error:

l = StdOutListener() auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) stream = Stream(auth, l) stream.filter(languages=["en"]) 

How can I extract all the tweets from certain language using Tweepy?

like image 579
Sudo Avatar asked Nov 12 '14 15:11

Sudo


People also ask

Can I filter Twitter by language?

From the navigation menu, tap Settings and privacy. Tap Content preferences, and choose Recommendations from the Languages drop-down menu. Choose from any languages listed you'd like to see people, Trends, and Tweets in. Tap Done.

How do I mute a language on Twitter?

Click the Privacy and safety tab, then click Mute and block. Click Muted words. Click the plus icon. Enter the word or hashtag you'd like to mute.


1 Answers

You can't (without special access). Streaming all the tweets (unfiltered) requires a connection to the firehose, which is granted only in specific use cases by Twitter. Honestly, the firehose isn't really necessary--proper use of track can get you more tweets than you know what to do with.

Try using something like this:

stream.filter(languages=["en"], track=["a", "the", "i", "you", "u"]) # etc 

Filtering by words like that will get you many, many tweets. If you want real data for the most-used words, check out this article from Time: The 500 Most Frequently Used Words on Twitter. You can use up to 400 keywords, but that will likely approach the 1% limit of tweets at a given time interval. If your track parameter matches 60% of all tweets at a given time, you will still only get 1% (which is a LOT of tweets).

like image 152
Luigi Avatar answered Nov 17 '22 22:11

Luigi