Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I consume tweets from Twitter's streaming api and store them in mongodb

Tags:

I need to develop an app that lets me track tweets and save them in a mongodb for a research project (as you might gather, I am a noob, so please bear with me). I have found this piece of code that sends tweets streaming through my terminal window:

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key = ""
access_secret = "" 


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['Gandolfini'])

Is there a way I can modify this piece of code so that instead of having tweets streaming over my screen, they are sent to my mongodb database?

Thanks

like image 371
user2161725 Avatar asked Jun 20 '13 12:06

user2161725


People also ask

Is a Python library for accessing the Twitter API Tweepy?

Tweepy is an open-sourced, easy-to-use Python library for accessing the Twitter API. It gives you an interface to access the API from your Python application. Alternatively, you can also install it from the GitHub repository.

How do I extract tweets using Tweepy?

Steps to obtain keys: – For access token, click ” Create my access token”. The page will refresh and generate access token. Tweepy is one of the library that should be installed using pip. Now in order to authorize our app to access Twitter on our behalf, we need to use the OAuth Interface.

How does MongoDB store twitter data?

Since JSON and BSON are so similar, storing a tweet in a MongoDB database is as easy as putting the entire content of the tweet's JSON string into an insert statement. Recalling or searching the tweets is rather simple as well; it does require an OOP mindset over the traditional SQL command structure.


2 Answers

Here's an example:

import json
import pymongo
import tweepy

consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def __init__(self, api):
        self.api = api
        super(tweepy.StreamListener, self).__init__()

        self.db = pymongo.MongoClient().test

    def on_data(self, tweet):
        self.db.tweets.insert(json.loads(tweet))

    def on_error(self, status_code):
        return True # Don't kill the stream

    def on_timeout(self):
        return True # Don't kill the stream


sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=['Gandolfini'])

This will write tweets to the mongodb test database, tweets collection.

Hope that helps.

like image 154
alecxe Avatar answered Jan 23 '23 00:01

alecxe


I have developed a simple command line tool that does exactly this.

https://github.com/janezkranjc/twitter-tap

It allows using the streaming API or the search API.

like image 23
johnny Avatar answered Jan 23 '23 00:01

johnny