Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a slower or more controlled alternative to .apply()?

So this may seem like an odd question, but I have a pandas DataFrame with addresses in it, that I want to geocode so I can get the latitude and longitude.

I have code that works using .apply() thanks to this very helpful thread (new column with coordinates using geopy pandas), but my problem is that all of the open APIs have strict limits to how many requests per second they allow, and also requests per day.

I haven't been able to find any way to throttle my code so match the limits of the APIs. My DF has 25K rows, but I've only been able to successfully geocode if I create a subset of it with up to 5 rows.

I don't have a lot of experience with python and pandas, but in SAS the DATA steps iterate one row at a time, so I could have a sleep command that would throttle the requests. What would be the best way to implement something like that with python/pandas?

EDIT: So based on the answers so far, I wanted to confirm, my code would change from: df_small['city_coord'] = df_small['Address'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
to:

df_small = df_clean[:5]
def f(x, delay=1):
# run your code    
sleep(delay)
return geolocator.geocode(x)

df_small['city_coord'] = df_small['Address'].apply(f).apply(lambda x: (x.latitude, x.longitude))
like image 582
Michael Melillo Avatar asked Oct 14 '25 03:10

Michael Melillo


1 Answers

To iterate with a delay, you can use df.iterrows() and time.sleep():

from time import sleep

for row in df.iterrows():
    # run your code
    sleep(1) # how many seconds to wait

Or you can just put time.sleep() within the apply function itself (as @RafaelC suggests in the comments):

def f(x, delay=1):
    # run your code
    sleep(delay)

df.apply(f)
like image 164
ASGM Avatar answered Oct 16 '25 17:10

ASGM