Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Tweepy to listen to stream and search for tweets. How to stop previous search and only listen for new stream?

I'm using Flask and Tweepy to search for live tweets. On the front-end I have a user text input, and button called "Search". Ideally, when a user gives a search-term into the input and clicks the "Search" button, the Tweepy should listen for the new search-term and stop the previous search-term stream. When the "Search" button is clicked it executes this function:

@app.route('/search', methods=['POST'])
# gets search-keyword and starts stream
def streamTweets():
    search_term = request.form['tweet']
    search_term_hashtag = '#' + search_term
    # instantiate listener
    listener = StdOutListener()
    # stream object uses listener we instantiated above to listen for data
    stream = tweepy.Stream(auth, listener)

    if stream is not None:
        print "Stream disconnected..."
        stream.disconnect()

    stream.filter(track=[search_term or search_term_hashtag], async=True)
    redirect('/stream') # execute '/stream' sse
    return render_template('index.html')

The /stream route that is executed in the second to last line in above code is as follows:

@app.route('/stream')
def stream():
    # we will use Pub/Sub process to send real-time tweets to client
    def event_stream():
        # instantiate pubsub
        pubsub = red.pubsub()
        # subscribe to tweet_stream channel
        pubsub.subscribe('tweet_stream')
        # initiate server-sent events on messages pushed to channel
        for message in pubsub.listen():
            yield 'data: %s\n\n' % message['data']
    return Response(stream_with_context(event_stream()), mimetype="text/event-stream")

My code works fine, in the sense that it starts a new stream and searches for a given term whenever the "Search" button is clicked, but it does not stop the previous search. For example, if my first search term was "NYC" and then I wanted to search for a different term, say "Los Angeles", it will give me results for both "NYC" and "Los Angeles", which is not what I want. I want just "Los Angeles" to be searched. How do I fix this? In other words, how do I stop the previous stream? I looked through other previous threads, and I know I have to use stream.disconnect(), but I'm not sure how to implement this in my code. Any help or input would be greatly appreciated. Thanks so much!!

like image 943
stthomas Avatar asked Dec 11 '14 19:12

stthomas


2 Answers

Below is some code that will cancel old streams when a new stream is created. It works by adding new streams to a global list, and then calling stream.disconnect() on all streams in the list whenever a new stream is created.

diff --git a/app.py b/app.py
index 1e3ed10..f416ddc 100755
--- a/app.py
+++ b/app.py
@@ -23,6 +23,8 @@ auth.set_access_token(access_token, access_token_secret)
 app = Flask(__name__)
 red = redis.StrictRedis()

+# Add a place to keep track of current streams
+streams = []

 @app.route('/')
 def index():
@@ -32,12 +34,18 @@ def index():
 @app.route('/search', methods=['POST'])
 # gets search-keyword and starts stream
 def streamTweets():
+        # cancel old streams
+        for stream in streams:
+            stream.disconnect()
+
        search_term = request.form['tweet']
        search_term_hashtag = '#' + search_term
        # instantiate listener
        listener = StdOutListener()
        # stream object uses listener we instantiated above to listen for data
        stream = tweepy.Stream(auth, listener)
+        # add this stream to the global list
+        streams.append(stream)
        stream.filter(track=[search_term or search_term_hashtag],
                async=True) # make sure stream is non-blocking
        redirect('/stream') # execute '/stream' sse

What this does not solve is the problem of session management. With your current setup a search by one user will affect the searches of all users. This can be avoided by giving your users some identifier and storing their streams along with their identifier. The easiest way to do this is likely to use Flask's session support. You could also do this with a requestId as Pierre suggested. In either case you will also need code to notice when a user has closed the page and close their stream.

like image 105
MattL Avatar answered Oct 06 '22 16:10

MattL


Disclaimer: I know nothing about Tweepy, but this appears to be a design issue.

Are you trying to add state to a RESTful API? You may have a design problem. As JRichardSnape answered, your API shouldn't be the one taking care of canceling a request; it should be done in the front-end. What I mean here is in the javascript / AJAX / etc calling this function, add another call, to the new function

@app.route('/cancelSearch', methods=['POST']) With the "POST" that has the search terms. So long as you don't have state, you can't really do this safely in an async call: Imagine someone else makes the same search at the same time then canceling one will cancel both (remember, you don't have state so you don't know who you're canceling). Perhaps you do need state with your design.

If you must keep using this and don't mind breaking the "stateless" rule, then add a "state" to your request. In this case it's not so bad because you could launch a thread and name it with the userId, then kill the thread every new search

def streamTweets():
    search_term = request.form['tweet']
    userId = request.form['userId'] # If your limit is one request per user at a time. If multiple windows can be opened and you want to follow this limit, store userId in a cookie.
    #Look for any request currently running with this ID, and cancel them

Alternatively, you could return a requestId, which you would then keep in the front-end can call cancelSearch?requestId=$requestId. In cancelSearch, you would have to find the pending request (sounds like that's in tweepy since you're not using your own threads) and disconnect it.

Out of curiosity I just watched what happens when you search on Google, and it uses a GET request. Have a look (debug tools -> Network; then enter some text and see the autofill). Google uses a token sent with every request (every time you type something)). It doesn't mean it's used for this, but that's basically what I described. If you don't want a session, then use a unique identifier.

like image 30
Pierre-Francoys Brousseau Avatar answered Oct 06 '22 16:10

Pierre-Francoys Brousseau