Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Run multiple get requests in parallel and stop on first response

Python 3.4

requests 2.18

I want to make multiple requests.get calls and return the first that gets a response, stopping the rest. What's the best way to do this?

Edit: I ended up using python's multiprocessing module and a queue to achieve the result, like petre suggested in comments. Here's the code:

from multiprocessing import Queue, Process
from multiprocessing.queues import Empty
from requests import head


def _req(url, queue):
    a = head(url)
    if a.status_code == 200:
        queue.put(a)
    return


def head_all_get_first(urls):
    jobs = []
    q = Queue()
    for url in urls:
        p = Process(target=_req, args=(url, q))
        jobs.append(p)
    for p in jobs:
        p.start()
    try:
        ret = q.get(timeout=20)  # blocking get - wait at most 20 seconds for a return
    except Empty:  # thrown if the timeout is exceeded
        ret = None
    for p in jobs:
        p.terminate()
    return ret
like image 986
Bobs Burgers Avatar asked Oct 16 '25 01:10

Bobs Burgers


1 Answers

Let's view the possibilities:

Python uses GIL-based execution stepping for principally pure-[SERIAL] code-execution, thus a maximum one can reach is a "just"-[CONCURRENT] flow of code-execution, not a true-[PARALLEL] system scheduling ( further details go way beyond this post, but are not needed now for presenting the below mentioned options ).

Your description could be thus filled with either of:
- using a distributed system ( a coordinated, multi-agent, bidirectionally talking network of processors )
- using a GIL-multiplexed python thread-based backend ( as in multiprocessing )
- using a GIL-independent python subprocess-based backend ( as in joblib )

In all cases, there will be different the sum of "costs" of a setup and termination of such a code-execution-infrastructure ( where a subprocess-based one will be the most expensive, as it always first generates a full copy of the initial-state of the python interpreter, including all its memory allocations et al, which is very expensive for "small pieces of meat to process", so if in a need for speed and shaving-off any latency down to any fraction of milliseconds, one may straight exclude this one ).

A nice example of a ( best a semi-persistent ) multi-agent network is using ZeroMQ or nanomsg inter-agent mediation tools ( where a wide range of transport classes { inproc:// | ipc:// | tcp:// | ... | vmci:// } allows one to operate smart solutions both in local/distributed layouts and your reaching a "first answered" state could be smart, fast and cheaply "advertised" to all the rest of the group of co-operating agents, that were not the first one.

enter image description here

Both the Ventilator and the Sink roles could be in the same python process, be them a class-method or in a form of some locally-multiplexed parts of an imperative code and also the uniform PUSH-strategy could grow smarter, if non-homogeneous tasking is coming to get implemented (yet the principle of operating a smart network of multi-agent processing is the way to go ).

like image 175
user3666197 Avatar answered Oct 17 '25 16:10

user3666197



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!