When using the python tweepy
library to pull tweets from twitter's streaming API is it possible to exclude retweets?
For instance, if I want only the tweets posted by a particular user ex: twitterStream.filter(follow = ["20264932"])
but this returns retweets and I would like to exclude them. How can I do this?
Thank you in advance.
You can turn off Retweets for a specific account if you don't like what they share. Select Turn off Retweets from an account profile page to stop seeing Tweets they've Retweeted (tap the gear icon on iOS or click or tap the overflow icon on web and Android).
Log into Twitter. Navigate to the profile of the account you'd like to stop seeing retweets from. Click the circular icon with three horizontal dots to the right of their profile picture. Select the first option in this menu labeled "Turn off Retweets."
Tweepy provides the API interface for the Twitter API v1. 1. For the v2 API, Tweepy provides the Client interface. This is available from Tweepy v4.
Just checking a tweet's text to see if it starts with 'RT' is not really a robust solution. You need to make a decision about what you will consider a retweet, since it isn't exactly clear-cut. The Twitter API docs explain that tweets with 'RT' in the tweet text aren't officially retweets.
Sometimes people type RT at the beginning of a Tweet to indicate that they are re-posting someone else's content. This isn't an official Twitter command or feature, but signifies that they are quoting another user's Tweet.
If you're going by the 'official' definition, then you want to filter tweets out if they have a True
value for their retweeted attribute, like this:
if not tweet['retweeted']:
# do something with standard tweets
And if you want to be more inclusive, including 'unofficial' re-tweets, you should check the string for the substring 'RT @' and not merely if it starts with 'RT' because that the former is cleaner, faster and eliminates more edge cases where a tweet starts with 'RT' but isn't a retweet (lots of data out there, I'm sure this is a possibility). Here's some code for that:
if not tweet['retweeted'] and 'RT @' not in tweet['text']:
# do something with standard tweets
The latter conditional takes the subset of tweets in your collection that are regular tweets and does an intersection with the subset of tweets in your collection that do not have 'RT @' in the tweet text, leaving you with tweets that are supposedly regular tweets.
Yes there are possible ways of doing this, One of them is to check if the text of the tweet, starts with RT
, For this we can easily use .startswith()
method on strings and for this you need to change the code of the on_data()
method in your streaming class, which can be done as:
class TwitterStreamListener(tweepy.StreamListener):
def on_data(self, data):
# Twitter returns data in JSON format - we need to decode it first
decoded = json.loads(data)
if not decoded[`text`].startswith('RT'):
#Do processing here
print decoded['text'].encode('ascii', 'ignore')
return True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With