Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize API calls for a large dataset using Python?

Objective: Send a list of addresses to an API and extract certain information(eg: a flag which indicates if an address is in a flood zone or not).

Solution: Working Python script for small data.

Problem: I want to optimize my current solution for large input. How to improve the performance of the API calls. If I have 100,000 addresses will my current solution fail? Will this slow down the HTTP calls? Will I get a request TIME out? Does the API resist the number of API calls being made?

  • Input: a list of addresses

Sample input

777 Brockton Avenue, Abington MA 2351

30 Memorial Drive, Avon MA 2322

  • Output: A data frame of all addresses and a flag which indicates if it is in a flood zone or not.
  • API: https://hazards.fema.gov/gis/nfhl/rest/services/public/NFHL/MapServer/28/query.

My current solution works well for a small dataset.

# Creating a function to get lat & long of the existing adress and then detecting the zone in fema
def zonedetect(addrs):
    global geolocate
    geocode_result = geocode(address=addrs, as_featureset=True)
    latitude = geocode_result.features[0].geometry.x
    longitude = geocode_result.features[0].geometry.y
    url = "https://hazards.fema.gov/gis/nfhl/rest/services/public/NFHL/MapServer/28/query?where=1%3D1&text=&objectIds=&time=&geometry="+str(latitude)+"%2C"+str(longitude)+"&geometryType=esriGeometryPoint&inSR=4326&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&queryByDistance=&returnExtentsOnly=false&datumTransformation=&parameterValues=&rangeValues=&f=json"
    response = req.get(url)
    parsed_data = json.loads(response.text)
    formatted_data = json_normalize(parsed_data["features"])
    formatted_data["Address_1"] = addrs

    #Exception handling
    if response.status_code == 200:
        geolocate = geolocate.append(formatted_data, ignore_index = True)
    else: 
        print("Request to {} failed".format(postcode))

# Reading every adress from existing dataframe
for i in range(len(df.index)):
    zonedetect(df["Address"][i])

Instead of using the for loop above is there an alternative. Can I process this logic in a batch?

like image 600
shockwave Avatar asked Apr 11 '26 22:04

shockwave


1 Answers

Sending 100,000 requests to the hazards.fema.gov server will definitely cause some slow downs on their server but it will mostly impact your script as you will need to wait for every single HTTP request to be queued and responded to which could take an extremely long time to process.

What would be better is to send one REST query for everything you will need and then handle the logic afterwards. Looking at the REST API, you can find that the geometry URL parameter is able to accept a geometryMultiPoint from the docs. Here is an example of a multiPoint:

{
  "points" : [[-97.06138,32.837],[-97.06133,32.836],[-97.06124,32.834],[-97.06127,32.832]],
  "spatialReference" : {"wkid" : 4326}
}

So what you can do is make an object to store all the points you want to query:

multipoint = { points: [], spatialReference: { wkid: 4326}

And when you loop, append the lat/long point to the multipoint list:

for i in range(len(df.index)):
    address = df["Address"][i]
    geocode_result = geocode(address=addrs, as_featureset=True)
    latitude = geocode_result.features[0].geometry.x
    longitude = geocode_result.features[0].geometry.y
    multiPoint.points.append([latitude, longitude])

Then you can set the multipoint as the geometry in your query which results in just one API request instead of one for each point.

like image 177
R10t-- Avatar answered Apr 14 '26 20:04

R10t--



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!