Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tweepy Streaming API : full text

I am using tweepy streaming API to get the tweets containing a particular hashtag . The problem that I am facing is that I am unable to extract full text of the tweet from the Streaming API . Only 140 characters are available and after that it gets truncated.

Here is the code:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)


def analyze_status(text):

    if 'RT' in text[0:3]:
        return True
    else:
        return False

    class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):

    if not analyze_status(status.text):

        with open('fetched_tweets.txt', 'a') as tf:
            tf.write(status.text.encode('utf-8') + '\n\n')

        print(status.text)

    def on_error(self, status):
    print("Error Code : " + status)

    def test_rate_limit(api, wait=True, buffer=.1):
        """
        Tests whether the rate limit of the last request has been reached.
        :param api: The `tweepy` api instance.
        :param wait: A flag indicating whether to wait for the rate limit reset
                 if the rate limit has been reached.
        :param buffer: A buffer time in seconds that is added on to the waiting
                   time as an extra safety margin.
        :return: True if it is ok to proceed with the next request. False otherwise.
        """
        # Get the number of remaining requests
        remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
        # Check if we have reached the limit
        if remaining == 0:
        limit = int(api.last_response.getheader('x-rate-limit-limit'))
        reset = int(api.last_response.getheader('x-rate-limit-reset'))
        # Parse the UTC time
        reset = datetime.fromtimestamp(reset)
        # Let the user know we have reached the rate limit
        print "0 of {} requests remaining until {}.".format(limit, reset)

        if wait:
            # Determine the delay and sleep
            delay = (reset - datetime.now()).total_seconds() + buffer
            print "Sleeping for {}s...".format(delay)
            sleep(delay)
            # We have waited for the rate limit reset. OK to proceed.
            return True
        else:
            # We have reached the rate limit. The user needs to handle the rate limit manually.
            return False

        # We have not reached the rate limit
        return True

    myStreamListener = MyStreamListener()
    myStream = tweepy.Stream(auth=api.auth, listener=myStreamListener,
                             tweet_mode='extended')

    myStream.filter(track=['#bitcoin'], async=True)

Does any one have a solution ?

like image 257
Varad Bhatnagar Avatar asked Jan 18 '18 10:01

Varad Bhatnagar


People also ask

How do you get full text on Tweepy?

If we want to get the complete text, pass another parameter tweet_mode = "extended" . From this object, fetch the text attribute present in it. If we want to get the complete text, fetch the attribute full_text.

How can I get more than 100 tweets on Tweepy?

If you need more than 100 Tweets, you have to use the paginator method and specify the limit i.e. the total number of Tweets that you want. Replace limit=1000 with the maximum number of tweets you want. Replace the limit=1000 with the maximum number of tweets you want (gist).

What is Tweepy StreamListener?

In Tweepy, an instance of tweepy. Stream establishes a streaming session and routes messages to StreamListener instance. The on_data method of a stream listener receives all messages and calls functions according to the message type.


2 Answers

tweet_mode=extended will have no effect in this code, since the Streaming API does not support that parameter. If a Tweet contains longer text, it will contain an additional object in the JSON response called extended_tweet, which will in turn contain a field called full_text.

In that case, you'll want something like print(status.extended_tweet.full_text) to extract the longer text.

like image 128
Andy Piper Avatar answered Oct 15 '22 22:10

Andy Piper


There is Boolean available in the Twitter stream. 'status.truncated' is True when the message contains more than 140 characters. Only then the 'extended_tweet' object is available:

        if not status.truncated:
            text = status.text
        else:
            text = status.extended_tweet['full_text']

This works only when you are streaming tweets. When you are collecting older tweets using the API method you can use something like this:

tweets = api.user_timeline(screen_name='whoever', count=5, tweet_mode='extended')
for tweet in tweets:
    print(tweet.full_text)

This full_text field contains the text of all tweets, truncated or not.

like image 20
Sjoerd van Staveren Avatar answered Oct 15 '22 23:10

Sjoerd van Staveren