Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limiting number of HTTP requests per second on Python

I've written a script that fetches URLs from a file and sends HTTP requests to all the URLs concurrently. I now want to limit the number of HTTP requests per second and the bandwidth per interface (eth0, eth1, etc.) in a session. Is there any way to achieve this on Python?

like image 874
Naveen Avatar asked Sep 29 '14 11:09

Naveen


People also ask

How do you limit limits per second in Python?

Add a wait() command inside your workers to get them waiting between the requests (in the example from documentation: inside the "while true" after the task_done). Example: 5 "Worker"-Threads with a waiting time of 1 sec between the requests will do less then 5 fetches per second. Save this answer.

How do you make HTTP request faster in Python?

Solution #1: The Synchronous Way We can leverage the Session object to further increase the speed. The Session object will use urllib3's connection pooling, which means, for repeating requests to the same host, the Session object's underlying TCP connection will be re-used, hence gain a performance increase.

How does Python handle multiple requests at the same time?

Use Multiple Servers One way you can handle multiple requests is to have multiple physical servers. Each request can be given to a server that is free and it can service it. This approach is highly inefficient as we are adding more servers without effectively using resources of the existing servers.

Is aiohttp better than requests?

get is that requests fetches the whole body of the response at once and remembers it, but aiohttp doesn't. aiohttp lets you ignore the body, or read it in chunks, or read it after looking at the headers/status code. That's why you need to do a second await : aiohttp needs to do more I/O to get the response body.


1 Answers

You could use Semaphore object which is part of the standard Python lib: python doc

Or if you want to work with threads directly, you could use wait([timeout]).

There is no library bundled with Python which can work on the Ethernet or other network interface. The lowest you can go is socket.

Based on your reply, here's my suggestion. Notice the active_count. Use this only to test that your script runs only two threads. Well in this case they will be three because number one is your script then you have two URL requests.

import time
import requests
import threading

# Limit the number of threads.
pool = threading.BoundedSemaphore(2)

def worker(u):
    # Request passed URL.
    r = requests.get(u)
    print r.status_code
    # Release lock for other threads.
    pool.release()
    # Show the number of active threads.
    print threading.active_count()

def req():
    # Get URLs from a text file, remove white space.
    urls = [url.strip() for url in open('urllist.txt')]
    for u in urls:
        # Thread pool.
        # Blocks other threads (more than the set limit).
        pool.acquire(blocking=True)
        # Create a new thread.
        # Pass each URL (i.e. u parameter) to the worker function.
        t = threading.Thread(target=worker, args=(u, ))
        # Start the newly create thread.
        t.start()

req()
like image 52
Georgi Avatar answered Sep 23 '22 07:09

Georgi