filtering of tweets received from statuses/filter (streaming API)

Tags:

I have N different keywords that i am tracking (for sake of simplicity, let N=3). So in GET statuses/filter, I will give 3 keywords in the "track" argument.

Now the tweets that i will be receiving can be from ANY of the 3 keywords that i mentioned. The problem is that i want to resolve as to which tweet corresponds to which keyword. i.e. mapping between tweets and the keyword(s) (that are mentioned in the "track" argument).

Apparently, there is no way to do this without doing any processing on the tweets received.

So i was wondering what is the best way to do this processing? Search for keywords in the text of the tweet? What about case-insensitive? What about when multiple words are there in same keyword, e.g: "Katrina Kaif" ?

I am currently trying to formulate some regular expression...

I was thinking the BEST way would to use the same logic (regular expressions etc.) as is used originally be statuses/filter API. How to know what logic is used by Twitter API statuses/filter itself to match tweets to the keywords ?

Advice? Help?

P.S.: I am using Python, Tweepy, Regex, MongoDb/Apache S4 (for distributed computing)

295

asked May 17 '13 06:05

user1599964

1 Answers

The first thing coming into my mind is to create a separate stream for every keyword and start it in a separate thread, like this:

from threading import Thread
import tweepy


class StreamListener(tweepy.StreamListener):
    def __init__(self, keyword, api=None):
        super(StreamListener, self).__init__(api)
        self.keyword = keyword

    def on_status(self, tweet):
        print 'Ran on_status'

    def on_error(self, status_code):
        print 'Error: ' + repr(status_code)
        return False

    def on_data(self, data):
        print self.keyword, data
        print 'Ok, this is actually running'


def start_stream(auth, track):
    tweepy.Stream(auth=auth, listener=StreamListener(track)).filter(track=[track])


auth = tweepy.OAuthHandler(<consumer_key>, <consumer_secret>)
auth.set_access_token(<key>, <secret>)

track = ['obama', 'cats', 'python']
for item in track:
    thread = Thread(target=start_stream, args=(auth, item))
    thread.start()

If you still want to distinguish tweets by keywords by yourself in a single stream, here's some info on how twitter uses track request parameter. There are some edge cases that could cause problems.

Hope that helps.

answered Sep 18 '22 01:09

alecxe

Related questions
                            
                                Operations on two Lists
                            
                                how to package a django project?
                            
                                Python select with sockets and sys.stdin
                            
                                Handle HTML Form Data with Python?
                            
                                Where is the huey consumer configuration?
                            
                                Importing a CSV file into a PostgreSQL DB using Python-Django
                            
                                Why `setup.py install` does not update the script file?
                            
                                Encoding custom python objects as BSON with pymongo
                            
                                stratified sampling in numpy
                            
                                Crawling multiple sites with Python Scrapy with limited depth per site
                            
                                Python Full Precision Division Source
                            
                                Color coding matplotlib markers
                            
                                Time based data analysis with Python
                            
                                How to access the py.test capsys from inside a test?
                            
                                Confusions about Python Descriptors and <Descriptor HowTo Guide>
                            
                                many to many relationship using flask restless, post a one to many data
                            
                                Matplotlib interactive graph embedded in PyQt
                            
                                Render a mayavi scene with a large pipeline faster
                            
                                Installing uwsgi with plugins using 'pip'
                            
                                Fortran extension to Python via f2py: How to profile?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

filtering of tweets received from statuses/filter (streaming API)

Tags:

python

twitter

tweepy

tweetstream

user1599964

People also ask

1 Answers

alecxe

Recent Activity

Donate For Us