I have N different keywords that i am tracking (for sake of simplicity, let N=3). So in GET statuses/filter, I will give 3 keywords in the "track" argument.
Now the tweets that i will be receiving can be from ANY of the 3 keywords that i mentioned. The problem is that i want to resolve as to which tweet corresponds to which keyword. i.e. mapping between tweets and the keyword(s) (that are mentioned in the "track" argument).
Apparently, there is no way to do this without doing any processing on the tweets received.
So i was wondering what is the best way to do this processing? Search for keywords in the text of the tweet? What about case-insensitive? What about when multiple words are there in same keyword, e.g: "Katrina Kaif" ?
I am currently trying to formulate some regular expression...
I was thinking the BEST way would to use the same logic (regular expressions etc.) as is used originally be statuses/filter API. How to know what logic is used by Twitter API statuses/filter itself to match tweets to the keywords ?
Advice? Help?
P.S.: I am using Python, Tweepy, Regex, MongoDb/Apache S4 (for distributed computing)
With a specific keyword, you can typically only poll the last 5,000 tweets per keyword. Unlike Twitter's Search API where you are polling data from tweets that have already happened, Twitter's Streaming API is a push of data as tweets happen in near real-time.
Yes. You can control what you see and who you interact with on Twitter. You have three options located in your notifications settings to filter the notifications you receive: Quality filter, muted words, and advanced filters.
The filtered stream endpoint group enables developers to filter the real-time stream of public Tweets. This endpoint group's functionality includes multiple endpoints that enable you to create and manage rules, and apply those rules to filter a stream of real-time Tweets that will return matching public Tweets.
The Twitter API allows you to stream public Tweets from the platform in real-time so that you can display them and basic metrics about them.
The first thing coming into my mind is to create a separate stream for every keyword and start it in a separate thread, like this:
from threading import Thread
import tweepy
class StreamListener(tweepy.StreamListener):
def __init__(self, keyword, api=None):
super(StreamListener, self).__init__(api)
self.keyword = keyword
def on_status(self, tweet):
print 'Ran on_status'
def on_error(self, status_code):
print 'Error: ' + repr(status_code)
return False
def on_data(self, data):
print self.keyword, data
print 'Ok, this is actually running'
def start_stream(auth, track):
tweepy.Stream(auth=auth, listener=StreamListener(track)).filter(track=[track])
auth = tweepy.OAuthHandler(<consumer_key>, <consumer_secret>)
auth.set_access_token(<key>, <secret>)
track = ['obama', 'cats', 'python']
for item in track:
thread = Thread(target=start_stream, args=(auth, item))
thread.start()
If you still want to distinguish tweets by keywords by yourself in a single stream, here's some info on how twitter uses track
request parameter. There are some edge cases that could cause problems.
Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With