Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tweepy Streaming - Stop collecting tweets at x amount

I'm looking to have the Tweepy Streaming API stop pulling in tweets after I have stored x # of tweets in MongoDB.

I have tried IF and WHILE statements inside the class, defintion with counters, but cannot get it to stop at a certain X amount. This is a real head-banger for me. I found this link here: https://groups.google.com/forum/#!topic/tweepy/5IGlu2Qiug4 but my efforts to replicate this have failed. It always tells me that init needs an additional argument. I believe we have our Tweepy auth set different, so it is not apples to apples.

Any thoughts?

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json, time, sys

import tweepy
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)

class StdOutListener(StreamListener):

    def on_status(self, status):
        text = status.text
        created = status.created_at
        record = {'Text': text, 'Created At': created}
        print record #See Tweepy documentation to learn how to access other fields
        collection.insert(record)  


    def on_error(self, status):
        print 'Error on status', status

    def on_limit(self, status):
        print 'Limit threshold exceeded', status

    def on_timeout(self, status):
        print 'Stream disconnected; continuing...'


stream = Stream(auth, StdOutListener())
stream.filter(track=['tv'])
like image 530
AngryWhopper Avatar asked Dec 31 '13 21:12

AngryWhopper


People also ask

How can I get more than 100 tweets on Tweepy?

If you need more than 100 Tweets, you have to use the paginator method and specify the limit i.e. the total number of Tweets that you want. Replace limit=1000 with the maximum number of tweets you want. Replace the limit=1000 with the maximum number of tweets you want (gist).

Is a Python library for accessing the Twitter API Tweepy?

Tweepy is an open-sourced, easy-to-use Python library for accessing the Twitter API. It gives you an interface to access the API from your Python application. Alternatively, you can also install it from the GitHub repository.


1 Answers

You need to add a counter inside of your class in __init__, and then increment it inside of on_status. Then when the counter is below 20 it will insert a record into the collection. This could be done as show below:

def __init__(self, api=None):
    super(StdOutListener, self).__init__()
    self.num_tweets = 0

def on_status(self, status):
    record = {'Text': status.text, 'Created At': status.created_at}
    print record #See Tweepy documentation to learn how to access other fields
    self.num_tweets += 1
    if self.num_tweets < 20:
        collection.insert(record)
        return True
    else:
        return False
like image 75
Nat Dempkowski Avatar answered Sep 23 '22 06:09

Nat Dempkowski