Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save full text of a tweet with tweepy

I am a novice programmer in python. I am having troubles trying to extract the text of a series of tweets with tweepy and saving it to a text file (I ommit the authentication and stuff)

search = api.search("hello", count=10)

textlist=[]

for i in range(0,len(search)):
    textlist.append( search[i].text.replace('\n', '' ) )

f = open('temp.txt', 'w')
for i in range(0,len(idlist)):
    f.write(textlist[i].encode('utf-8') + '\n')

But in some long tweets the text at the end is truncated, and a three dot character "..." appears at the end of each string, so sometimes I lose links or hashtags. How can I avoid this?

like image 331
adrpino Avatar asked Apr 12 '15 09:04

adrpino


People also ask

How do you get the full text of a tweet Tweepy?

If we want to get the complete text, pass another parameter tweet_mode = "extended" . From this object, fetch the text attribute present in it. If we want to get the complete text, fetch the attribute full_text.

How can I get more than 100 tweets on Tweepy?

If you need more than 100 Tweets, you have to use the paginator method and specify the limit i.e. the total number of Tweets that you want. Replace limit=1000 with the maximum number of tweets you want. Replace the limit=1000 with the maximum number of tweets you want (gist).


2 Answers

With tweepy, you can get the full text using tweet_mode='extended' (not documented in the Tweepy doc). For instance:

(not extended)

print api.get_status('862328512405004288')._json['text']

@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue dā€¦ https://tco/kALZ2ki9Vc

(extended)

print api.get_status('862328512405004288', tweet_mode='extended')._json['full_text']

@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue de match de foot et cela ferait un beau cadeau pour mon copain !! šŸ™šŸ»šŸ™šŸ»šŸ™šŸ»šŸ˜šŸ˜

like image 158
mountrix Avatar answered Sep 22 '22 02:09

mountrix


The ... (ellipsis) are added when the tweet is part of a retweet (and thus, is truncated). This is mentioned in the documentation:

Indicates whether the value of the text parameter was truncated, for example, as a result of a retweet exceeding the 140 character Tweet length. Truncated text will end in ellipsis, like this ...

There is no way to avoid this, unless you take each individual tweet and then search any retweets of it and build the complete timeline (obviously this isn't practical for a simple search, you could do this if you were fetching a particular handle's timeline).

You can also simplify your code:

results = api.search('hello', count=10)

with open('temp.txt', 'w') as f:
   for tweet in results:
       f.write('{}\n'.format(tweet.decode('utf-8')))
like image 24
Burhan Khalid Avatar answered Sep 20 '22 02:09

Burhan Khalid