Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get English tweets alone using python?

Here is my current code

from twitter import *

t = Twitter(auth=OAuth(TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET, 
        ACCESS_TOKEN, ACCESS_TOKEN_SECRET))

t.statuses.home_timeline()
query=raw_input("enter the query \n")
data = t.search.tweets(q=query)

for i in range (0,1000):    
    print data['statuses'][i]['text']
    print '\n'

Here, I fetch tweets from all the languages. Is there a way to restrict myself to fetching tweets only in English?

like image 313
Sooraj Avatar asked Dec 14 '13 04:12

Sooraj


People also ask

Can I filter tweets by language?

We can filter according to the type of content and language of the tweets and/or the accounts that have been mentioned. We can also search tweets by date and location.

How do you get a Twitter feed in Python?

Getting Started Set up a Twitter account if you don't have one already. Using your Twitter account, you will need to apply for Developer Access and then create an application that will generate the API credentials that you will use to access Twitter from Python . Import the tweepy package.

Can you scrape Twitter with Python?

Tweepy is a Python library for integrating with the Twitter API. Because Tweepy is connected with the Twitter API, you can perform complex queries in addition to scraping tweets. It enables you to take advantage of all of the Twitter API's capabilities.


1 Answers

There are at least 4 ways... I put them in the order of simplicity.

  1. After you collect the tweets, the json output has a key/value pair that identifies the language. So you can use something like this to take all language tweets and select only the ones that are from English accounts.

    for i in range (0,1000):
       if data['statuses'][i][u'lang']==u'en':
          print data['statuses'][i]['text']
          print '\n'
    
  2. Another way to collect only tweets that are identified in English, you can use the optional 'lang' parameter to request from the API only English (self-idenfitied) tweets. See details here. If you are using the python-twitter library, you can set the 'lang' parameter in twitter.py.

  3. Use a language recognition package like guess-language.

  4. Or if you want to recognize English text without using the self-identified twitter data (i.e. a chinese account that is writing in English), then you have to do Natural Language Processing. One option. This method will recognize common English words and then mark the text as English.

like image 86
philshem Avatar answered Sep 27 '22 18:09

philshem