I'm trying to use Tweepy to get the full list of followers from an account with like 500k followers, and I have a code that gives me the usernames for smaller accounts, like under 100, but if I get one that's even like 110 followers, it doesn't work. Any help figuring out how to make it work with larger numbers is greatly appreciated!
Here's the code I have right now:
import tweepy
import time
key1 = "..."
key2 = "..."
key3 = "..."
key4 = "..."
accountvar = raw_input("Account name: ")
auth = tweepy.OAuthHandler(key1, key2)
auth.set_access_token(key3, key4)
api = tweepy.API(auth)
ids = []
for page in tweepy.Cursor(api.followers_ids, screen_name=accountvar).pages():
ids.extend(page)
time.sleep(60)
users = api.lookup_users(user_ids=ids)
for u in users:
print u.screen_name
The error I keep getting is:
Traceback (most recent call last):
File "test.py", line 24, in <module>
users = api.lookup_users(user_ids=ids)
File "/Library/Python/2.7/site-packages/tweepy/api.py", line 321, in lookup_users
return self._lookup_users(post_data=post_data)
File "/Library/Python/2.7/site-packages/tweepy/binder.py", line 239, in _call
return method.execute()
File "/Library/Python/2.7/site-packages/tweepy/binder.py", line 223, in execute
raise TweepError(error_msg, resp)
tweepy.error.TweepError: [{u'message': u'Too many terms specified in query.', u'code': 18}]
I've looked at a bunch of other questions about this type of question, but none I could find had a solution that worked for me, but if someone has a link to a solution, please send it to me!
If you need more than 100 Tweets, you have to use the paginator method and specify the limit i.e. the total number of Tweets that you want. Replace limit=1000 with the maximum number of tweets you want. Replace the limit=1000 with the maximum number of tweets you want (gist).
But keep in mind that Twitter levies a rate limit on the number of requests made to the Twitter API. To be precise, 900 requests/15 minutes are allowed; Twitter feeds anything above that an error.
If you want to retweet a Tweet with Tweepy using the Twitter API v2, you will need to make sure that you have your consumer key and consumer secret, along with your access token and access token secret, that are created with Read and Write permissions (similar to the previous example).
rpp – The number of tweets to return per page, up to a max of 100.
API.followers () The followers () method of the API class in Tweepy module is used to get the specified user’s followers ordered in which they were added. Syntax : API.followers (id / user_id / screen_name) Parameters : Only use one of the 3 options:
You can harvest 3,000 users per 15 minutes by adding a count parameter: users = tweepy.Cursor (api.followers, screen_name=accountvar, count=200).items () This will call the Twitter API 15 times as per your version, but rather than the default count=20, each API call will return 200 (i.e. you get 3000 rather than 300).
In the above mentioned profile the number of followers are : 17.8K (17, 800+) Identify the user ID or the screen name of the profile. Get the User object of the profile using the get_user () method with the user ID or the screen name. From this object, fetch the followers_count attribute present in it. We will use the user ID to fetch the user.
This tool uses Tweepy to connect to the Twitter API. In order to enumerate a target account’s followers, I like to start by using Tweepy’s followers_ids () function to get a list of Twitter ids of accounts that are following the target account.
I actually figured it out, so I'll post the solution here just for reference.
import tweepy
import time
key1 = "..."
key2 = "..."
key3 = "..."
key4 = "..."
accountvar = raw_input("Account name: ")
auth = tweepy.OAuthHandler(key1, key2)
auth.set_access_token(key3, key4)
api = tweepy.API(auth)
users = tweepy.Cursor(api.followers, screen_name=accountvar).items()
while True:
try:
user = next(users)
except tweepy.TweepError:
time.sleep(60*15)
user = next(users)
except StopIteration:
break
print "@" + user.screen_name
This stops after every 300 names for 15 minutes, and then continues. This makes sure that it doesn't run into problems. This will obviously take ages for large accounts, but as Leb mentioned:
The twitter API only allows 100 users to be searched for at a time...[so] what you'll need to do is iterate through each 100 users but staying within the rate limit.
You basically just have to leave the program running if you want the next set. I don't know why mine is giving 300 at a time instead of 100, but as I mentioned about my program earlier, it was giving me 100 earlier as well.
Hope this helps anyone else that had the same problem as me, and shoutout to Leb for reminding me to focus on the rate limit.
To extend upon this:
You can harvest 3,000 users per 15 minutes by adding a count parameter:
users = tweepy.Cursor(api.followers, screen_name=accountvar, count=200).items()
This will call the Twitter API 15 times as per your version, but rather than the default count=20, each API call will return 200 (i.e. you get 3000 rather than 300).
Twitter provides two ways to fetch the followers: -
Second approach involves two stages:-
a) Fetching only the followers ids first (using followers/ids in
Twitter API or api.followers_ids in tweepy).you can get 5000 *
15 = 75K follower ids in each 15 minutes window.
b) Looking up their usernames or other data (using users/lookup in twitter api or api.lookup_users in tweepy). This has rate limitation of about 100 * 180 = 18K lookups each 15 minute window.
Considering the rate limits, Second approach gives followers data 6 times faster when compared to first approach. Below is the code which could be used to do it using 2nd approach:-
#First, Make sure you have set wait_on_rate_limit to True while connecting through Tweepy
api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
#Below code will request for 5000 follower ids in one request and therefore will give 75K ids in every 15 minute window (as 15 requests could be made in each window).
followerids =[]
for user in tweepy.Cursor(api.followers_ids, screen_name=accountvar,count=5000).items():
followerids.append(user)
print (len(followerids))
#Below function could be used to make lookup requests for ids 100 at a time leading to 18K lookups in each 15 minute window
def get_usernames(userids, api):
fullusers = []
u_count = len(userids)
print(u_count)
try:
for i in range(int(u_count/100) + 1):
end_loc = min((i + 1) * 100, u_count)
fullusers.extend(
api.lookup_users(user_ids=userids[i * 100:end_loc])
)
return fullusers
except:
import traceback
traceback.print_exc()
print ('Something went wrong, quitting...')
#Calling the function below with the list of followeids and tweepy api connection details
fullusers = get_usernames(followerids,api)
Hope this helps. Similiar approach could be followed for fetching friends details by using api.friends_ids inplace of api.followers_ids
If you need more resources for rate limit comparison and for 2nd approach, check below links:-
https://github.com/tweepy/tweepy/issues/627
https://labsblog.f-secure.com/2018/02/27/how-to-get-twitter-follower-data-using-python-and-tweepy/
The twitter API only allows 100 users to be searched for at a time. That's why no matter how many you input to it you'll get 100. The followers_id
is giving you the correct number of users but you're being limited by GET users/lookup
What you'll need to do is iterate through each 100 users but staying within the rate limit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With