Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add a location filter to tweepy module

Tags:

I have found the following piece of code that works pretty well for letting me view in Python Shell the standard 1% of the twitter firehose:

import sys import tweepy  consumer_key="" consumer_secret="" access_key = "" access_secret = ""    auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.API(auth)   class CustomStreamListener(tweepy.StreamListener):     def on_status(self, status):         print status.text      def on_error(self, status_code):         print >> sys.stderr, 'Encountered error with status code:', status_code         return True # Don't kill the stream      def on_timeout(self):         print >> sys.stderr, 'Timeout...'         return True # Don't kill the stream  sapi = tweepy.streaming.Stream(auth, CustomStreamListener()) sapi.filter(track=['manchester united']) 

How do I add a filter to only parse tweets from a certain location? Ive seen people adding GPS to other twitter related Python code but I cant find anything specific to sapi within the Tweepy module.

Any ideas?

Thanks

like image 305
gdogg371 Avatar asked Apr 06 '14 01:04

gdogg371


People also ask

How do you find the location of Tweepy?

In order to get the location we have to do the following : Identify the user ID or the screen name of the profile. Get the User object of the profile using the get_user() method with the user ID or the screen name.

What is RPP in Tweepy?

The number of tweets to return per page, up to a maximum of 100. Defaults to 15. This was formerly the "rpp" parameter in the old Search API.

What is Tweepy StreamListener?

The default StreamListener can classify most common twitter messages and routes them to appropriately named methods, but these methods are only stubs. Therefore using the streaming api has three steps. Create a class inheriting from StreamListener. Using that class create a Stream object.

How can I get more than 100 tweets on Tweepy?

If you need more than 100 Tweets, you have to use the paginator method and specify the limit i.e. the total number of Tweets that you want. Replace limit=1000 with the maximum number of tweets you want. Replace the limit=1000 with the maximum number of tweets you want (gist).


2 Answers

The streaming API doesn't allow to filter by location AND keyword simultaneously.

Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twitter (even non-geo tweets) OR coming from the San Francisco area.

Source: https://dev.twitter.com/docs/streaming-apis/parameters#locations

What you can do is ask the streaming API for keyword or located tweets and then filter the resulting stream in your app by looking into each tweet.

If you modify the code as follows you will capture tweets in United Kingdom, then those tweets get filtered to only show those that contain "manchester united"

import sys import tweepy  consumer_key="" consumer_secret="" access_key="" access_secret=""  auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.API(auth)   class CustomStreamListener(tweepy.StreamListener):     def on_status(self, status):         if 'manchester united' in status.text.lower():             print status.text      def on_error(self, status_code):         print >> sys.stderr, 'Encountered error with status code:', status_code         return True # Don't kill the stream      def on_timeout(self):         print >> sys.stderr, 'Timeout...'         return True # Don't kill the stream  sapi = tweepy.streaming.Stream(auth, CustomStreamListener())     sapi.filter(locations=[-6.38,49.87,1.77,55.81]) 
like image 159
Juan E. Avatar answered Oct 25 '22 21:10

Juan E.


Juan gave the correct answer. I'm filtering for Germany only using this:

# Bounding boxes for geolocations # Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/ GEOBOX_WORLD = [-180,-90,180,90] GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]  stream.filter(locations=GEOBOX_GERMANY) 

This is a pretty crude box that includes parts of some other countries. If you want a finer grain you can combine multiple boxes to fill out the location you need.

It should be noted though that you limit the number of tweets quite a bit if you filter by geotags. This is from roughly 5 million Tweets from my test database (the query should return the %age of tweets that actually contain a geolocation):

> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count() 0.016668392651547598 

So only 1.67% of my sample of the 1% stream include a geotag. However there's other ways of figuring out a user's location: http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf

like image 29
Kristian Rother Avatar answered Oct 25 '22 20:10

Kristian Rother