I am a novice programmer in python. I am having troubles trying to extract the text of a series of tweets with tweepy
and saving it to a text file (I ommit the authentication and stuff)
search = api.search("hello", count=10)
textlist=[]
for i in range(0,len(search)):
textlist.append( search[i].text.replace('\n', '' ) )
f = open('temp.txt', 'w')
for i in range(0,len(idlist)):
f.write(textlist[i].encode('utf-8') + '\n')
But in some long tweets the text at the end is truncated, and a three dot character "..." appears at the end of each string, so sometimes I lose links or hashtags. How can I avoid this?
If we want to get the complete text, pass another parameter tweet_mode = "extended" . From this object, fetch the text attribute present in it. If we want to get the complete text, fetch the attribute full_text.
If you need more than 100 Tweets, you have to use the paginator method and specify the limit i.e. the total number of Tweets that you want. Replace limit=1000 with the maximum number of tweets you want. Replace the limit=1000 with the maximum number of tweets you want (gist).
With tweepy, you can get the full text using tweet_mode='extended'
(not documented in the Tweepy doc). For instance:
(not extended)
print api.get_status('862328512405004288')._json['text']
@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue dā¦ https://tco/kALZ2ki9Vc
(extended)
print api.get_status('862328512405004288', tweet_mode='extended')._json['full_text']
@tousuncotefoot @equipedefrance @CreditAgricole @AntoGriezmann @KMbappe @layvinkurzawa @UmtitiSam J'ai jamais vue de match de foot et cela ferait un beau cadeau pour mon copain !! šš»šš»šš»šš
The ...
(ellipsis) are added when the tweet is part of a retweet (and thus, is truncated). This is mentioned in the documentation:
Indicates whether the value of the text parameter was truncated, for example, as a result of a retweet exceeding the 140 character Tweet length. Truncated text will end in ellipsis, like this ...
There is no way to avoid this, unless you take each individual tweet and then search any retweets of it and build the complete timeline (obviously this isn't practical for a simple search, you could do this if you were fetching a particular handle's timeline).
You can also simplify your code:
results = api.search('hello', count=10)
with open('temp.txt', 'w') as f:
for tweet in results:
f.write('{}\n'.format(tweet.decode('utf-8')))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With