Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I retrieve all Tweets and attributes for a given user using Python?

I am attempting to retrieve data from Twitter, using Tweepy for a username typed at the command line. I'm wanting to extract quite a bit of data about the status and user,so have come up with the following:

Note that I am importing all the required modules ok and have oauth + keys (just not included it here) and filename is correct, just been changed:

# define user to get tweets for. accepts input from user
user = tweepy.api.get_user(input("Please enter the twitter username: "))

# Display basic details for twitter user name
print (" ")
print ("Basic information for", user.name)
print ("Screen Name:", user.screen_name)
print ("Name: ", user.name)
print ("Twitter Unique ID: ", user.id)
print ("Account created at: ", user.created_at)

timeline = api.user_timeline(screen_name=user, include_rts=True, count=100)
    for tweet in timeline:
        print ("ID:", tweet.id)
        print ("User ID:", tweet.user.id)
        print ("Text:", tweet.text)
        print ("Created:", tweet.created_at)
        print ("Geo:", tweet.geo)
        print ("Contributors:", tweet.contributors)
        print ("Coordinates:", tweet.coordinates) 
        print ("Favorited:", tweet.favorited)
        print ("In reply to screen name:", tweet.in_reply_to_screen_name)
        print ("In reply to status ID:", tweet.in_reply_to_status_id)
        print ("In reply to status ID str:", tweet.in_reply_to_status_id_str)
        print ("In reply to user ID:", tweet.in_reply_to_user_id)
        print ("In reply to user ID str:", tweet.in_reply_to_user_id_str)
        print ("Place:", tweet.place)
        print ("Retweeted:", tweet.retweeted)
        print ("Retweet count:", tweet.retweet_count)
        print ("Source:", tweet.source)
        print ("Truncated:", tweet.truncated)

I would like this eventually to iterate through all of a user's tweets (up to the 3200 limit). First things first though. So far though I have two problems, I get the following error message regarding retweets:

Please enter the twitter username: barackobamaTraceback (most recent call last):
  File " usertimeline.py", line 64, in <module>
    timeline = api.user_timeline(screen_name=user, count=100, page=1)
  File "C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py", line 153, in _call
    raise TweepError(error_msg)
tweepy.error.TweepError: Twitter error response: status code = 401
Traceback (most recent call last):
  File "usertimeline.py", line 42, in <module>
    user = tweepy.api.get_user(input("Please enter the twitter username: "))
  File "C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py", line 153, in _call
    raise TweepError(error_msg)
tweepy.error.TweepError: Twitter error response: status code = 404

Passing the username as a variable seems to be a problem also:

Traceback (most recent call last):
  File " usertimleline.py", line 64, in <module>
    timeline = api.user_timeline(screen_name=user, count=100, page=1)
  File "C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py", line 153, in _call
    raise TweepError(error_msg)
tweepy.error.TweepError: Twitter error response: status code = 401

I've isolated both these errors, i.e. they aren't working together.

Forgive my ignorance, I am not too hot with Twitter APIs but am learning pretty rapidly. Tweepy documentation really does suck and I've done loads of reading round on the net, just can't seem to get this fixed. If I can get this sorted, i'll be posting up some documentation.

I know how to transfer the data into an MySQL db once extracted (it will do that, rather than print to screen) and manipulate it so that I can do stuff with it, it is just getting it out that I am having the problems with. Does anyone have any ideas or is there another method I should be considering?

Any help really appreciated. Cheers

EDIT:

Following on from @Eric Olson's suggestion this morning; I did the following.

1) Created a completely brand new set of Oauth credentials to test. 2) Copied code across to a new script as follows:

Oauth

consumer_key = "(removed)"
consumer_secret = "(removed)"
access_key="88394805-(removed)"
access_secret="(removed)"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api=tweepy.API(auth)



# confirm account being used for OAuth
print ("API NAME IS: ", api.me().name)
api.update_status("Using Tweepy from the command line")

The first time i run the script, it works fine and updates my status and returns the API name as follows:

>>> 
API NAME IS:  Chris Howden

Then from that point on I get this:

Traceback (most recent call last):
  File "C:/Users/Chris/Dropbox/Uni_2012-3/6CC995 - Independent Studies/Scripts/get Api name and update status.py", line 19, in <module>
    api.update_status("Using Tweepy frm the command line")
  File "C:\Python32\lib\site-packages\tweepy-1.4-py3.2.egg\tweepy\binder.py", line 153, in _call
    raise TweepError(error_msg)
tweepy.error.TweepError: Twitter error response: status code = 403

The only reason I can see for it doing something like this is that it is rejecting the generated access token. I shouldn't need to renew the access token should I?

like image 482
chowden Avatar asked Mar 26 '13 02:03

chowden


1 Answers

If you're open to trying another library, you could give rauth a shot. There's already a Twitter example but if you're feeling lazy and just want a working example, here's how I'd modify that demo script:

from rauth import OAuth1Service

# Get a real consumer key & secret from https://dev.twitter.com/apps/new
twitter = OAuth1Service(
    name='twitter',
    consumer_key='J8MoJG4bQ9gcmGh8H7XhMg',
    consumer_secret='7WAscbSy65GmiVOvMU5EBYn5z80fhQkcFWSLMJJu4',
    request_token_url='https://api.twitter.com/oauth/request_token',
    access_token_url='https://api.twitter.com/oauth/access_token',
    authorize_url='https://api.twitter.com/oauth/authorize',
    base_url='https://api.twitter.com/1/')

request_token, request_token_secret = twitter.get_request_token()

authorize_url = twitter.get_authorize_url(request_token)

print 'Visit this URL in your browser: ' + authorize_url
pin = raw_input('Enter PIN from browser: ')

session = twitter.get_auth_session(request_token,
                                   request_token_secret,
                                   method='POST',
                                   data={'oauth_verifier': pin})

params = {'screen_name': 'github',  # User to pull Tweets from
          'include_rts': 1,         # Include retweets
          'count': 10}              # 10 tweets

r = session.get('statuses/user_timeline.json', params=params)

for i, tweet in enumerate(r.json(), 1):
    handle = tweet['user']['screen_name'].encode('utf-8')
    text = tweet['text'].encode('utf-8')
    print '{0}. @{1} - {2}'.format(i, handle, text)

You can run this as-is, but be sure to update the credentials! These are meant for demo purposes only.

Full disclosure, I am the maintainer of rauth.

like image 102
maxcountryman Avatar answered Sep 23 '22 06:09

maxcountryman