I am working with Python (IPython & Canopy) and a RESTful content API, on my local machine (Mac).
I have an array of 3000 unique IDs to pull data for from the API and can only call the API with one ID at a time.
I was hoping somehow to make 3 sets of 1000 calls in parallel to speed things up.
What is the best way of doing this?
Thanks in advance for any help!
warn(error); }); If you want to call multiple API calls simultaneously, there's a better approach using Promise. all() . But if one API calls requires data from another, returning the fetch() method like this provides a simple, readable, flat structure and let's you use a single catch() for all of your API calls.
Parallel is a document process solution that helps its users collect documents and information efficiently, giving them control over what their recipients have replied and the launched processes' status. By opening our API, we want to empower organizations with our technology.
Without more information about what you are doing in particular, it is hard to say for sure, but a simple threaded approach may make sense.
Assuming you have a simple function that processes a single ID:
import requests
url_t = "http://localhost:8000/records/%i"
def process_id(id):
"""process a single ID"""
# fetch the data
r = requests.get(url_t % id)
# parse the JSON reply
data = r.json()
# and update some data with PUT
requests.put(url_t % id, data=data)
return data
You can expand that into a simple function that processes a range of IDs:
def process_range(id_range, store=None):
"""process a number of ids, storing the results in a dict"""
if store is None:
store = {}
for id in id_range:
store[id] = process_id(id)
return store
and finally, you can fairly easily map sub-ranges onto threads to allow some number of requests to be concurrent:
from threading import Thread
def threaded_process_range(nthreads, id_range):
"""process the id range in a specified number of threads"""
store = {}
threads = []
# create the threads
for i in range(nthreads):
ids = id_range[i::nthreads]
t = Thread(target=process_range, args=(ids,store))
threads.append(t)
# start the threads
[ t.start() for t in threads ]
# wait for the threads to finish
[ t.join() for t in threads ]
return store
A full example in an IPython Notebook: http://nbviewer.ipython.org/5732094
If your individual tasks take a more widely varied amount of time, you may want to use a ThreadPool, which will assign jobs one at a time (often slower if individual tasks are very small, but guarantees better balance in heterogenous cases).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With