Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stripping Line Breaks in Tweets via Tweepy

I'm looking pull data from the Twitter API and create a pipe separated file that I can do further processing on. My code currently looks like this:

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

out_file = "tweets.txt"

tweets = api.search(q='foo')
o = open(out_file, 'a')

for tweet in tweets:
        id = str(tweet.id)
        user = tweet.user.screen_name
        post = tweet.text
        post = post.encode('ascii', 'ignore')
        post = post.strip('|') # so pipes in tweets don't create unwanted separators
        post = post.strip('\r\n')
        record = id + "|" + user + "|" + post
        print>>o, record

I have a problem when a user's tweet includes line breaks which makes the output data look like this:

473565810326601730|usera|this is a tweet 
473565810325865901|userb|some other example 
406478015419876422|userc|line 
separated 
tweet
431658790543289758|userd|one more tweet

I want to strip out the line breaks on the third tweet. I've tried post.strip('\n') and post.strip('0x0D 0x0A') in addition to the above but none seem to work. Any ideas?

like image 645
Kevin Avatar asked Oct 01 '22 11:10

Kevin


1 Answers

That is because strip returns "a copy of the string with leading and trailing characters removed".

You should use replace for the new line and for the pipe:

post = post.replace('|', ' ')
post = post.replace('\n', ' ')
like image 176
Juan E. Avatar answered Oct 04 '22 10:10

Juan E.