I'm looking pull data from the Twitter API and create a pipe separated file that I can do further processing on. My code currently looks like this:
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
out_file = "tweets.txt"
tweets = api.search(q='foo')
o = open(out_file, 'a')
for tweet in tweets:
id = str(tweet.id)
user = tweet.user.screen_name
post = tweet.text
post = post.encode('ascii', 'ignore')
post = post.strip('|') # so pipes in tweets don't create unwanted separators
post = post.strip('\r\n')
record = id + "|" + user + "|" + post
print>>o, record
I have a problem when a user's tweet includes line breaks which makes the output data look like this:
473565810326601730|usera|this is a tweet
473565810325865901|userb|some other example
406478015419876422|userc|line
separated
tweet
431658790543289758|userd|one more tweet
I want to strip out the line breaks on the third tweet. I've tried post.strip('\n') and post.strip('0x0D 0x0A') in addition to the above but none seem to work. Any ideas?
That is because strip returns "a copy of the string with leading and trailing characters removed".
You should use replace for the new line and for the pipe:
post = post.replace('|', ' ')
post = post.replace('\n', ' ')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With