Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude retweets from twitter streaming api using tweepy

When using the python tweepy library to pull tweets from twitter's streaming API is it possible to exclude retweets?

For instance, if I want only the tweets posted by a particular user ex: twitterStream.filter(follow = ["20264932"]) but this returns retweets and I would like to exclude them. How can I do this?

Thank you in advance.

like image 512
Daniel Avatar asked Apr 17 '15 03:04

Daniel


People also ask

How do you exclude retweets on twitter?

You can turn off Retweets for a specific account if you don't like what they share. Select Turn off Retweets from an account profile page to stop seeing Tweets they've Retweeted (tap the gear icon on iOS or click or tap the overflow icon on web and Android).

Can you filter out retweets?

Log into Twitter. Navigate to the profile of the account you'd like to stop seeing retweets from. Click the circular icon with three horizontal dots to the right of their profile picture. Select the first option in this menu labeled "Turn off Retweets."

Does Tweepy work with Twitter API v2?

Tweepy provides the API interface for the Twitter API v1. 1. For the v2 API, Tweepy provides the Client interface. This is available from Tweepy v4.


2 Answers

Just checking a tweet's text to see if it starts with 'RT' is not really a robust solution. You need to make a decision about what you will consider a retweet, since it isn't exactly clear-cut. The Twitter API docs explain that tweets with 'RT' in the tweet text aren't officially retweets.

Sometimes people type RT at the beginning of a Tweet to indicate that they are re-posting someone else's content. This isn't an official Twitter command or feature, but signifies that they are quoting another user's Tweet.

If you're going by the 'official' definition, then you want to filter tweets out if they have a True value for their retweeted attribute, like this:

if not tweet['retweeted']:
    # do something with standard tweets

And if you want to be more inclusive, including 'unofficial' re-tweets, you should check the string for the substring 'RT @' and not merely if it starts with 'RT' because that the former is cleaner, faster and eliminates more edge cases where a tweet starts with 'RT' but isn't a retweet (lots of data out there, I'm sure this is a possibility). Here's some code for that:

if not tweet['retweeted'] and 'RT @' not in tweet['text']:
    # do something with standard tweets

The latter conditional takes the subset of tweets in your collection that are regular tweets and does an intersection with the subset of tweets in your collection that do not have 'RT @' in the tweet text, leaving you with tweets that are supposedly regular tweets.

like image 72
foundling Avatar answered Sep 18 '22 12:09

foundling


Yes there are possible ways of doing this, One of them is to check if the text of the tweet, starts with RT, For this we can easily use .startswith() method on strings and for this you need to change the code of the on_data() method in your streaming class, which can be done as:

class TwitterStreamListener(tweepy.StreamListener):
    def on_data(self, data):
        # Twitter returns data in JSON format - we need to decode it first
        decoded = json.loads(data)
        if  not decoded[`text`].startswith('RT'):
            #Do processing here 
            print decoded['text'].encode('ascii', 'ignore')
        return True
like image 21
ZdaR Avatar answered Sep 17 '22 12:09

ZdaR