Limiting/throttling the rate of HTTP requests in GRequests

Tags:

I'm writing a small script in Python 2.7.3 with GRequests and lxml that will allow me to gather some collectible card prices from various websites and compare them. Problem is one of the websites limits the number of requests and sends back HTTP error 429 if I exceed it.

Is there a way to add throttling the number of requests in GRequestes so that I don't exceed the number of requests per second I specify? Also - how can I make GRequestes retry after some time if HTTP 429 occurs?

On a side note - their limit is ridiculously low. Something like 8 requests per 15 seconds. I breached it with my browser on multiple occasions just refreshing the page waiting for price changes.

787

asked Nov 27 '13 16:11

Bartłomiej Siwek

2 Answers

Going to answer my own question since I had to figure this by myself and there seems to be very little info on this going around.

The idea is as follows. Every request object used with GRequests can take a session object as a parameter when created. Session objects on the other hand can have HTTP adapters mounted that are used when making requests. By creating our own adapter we can intercept requests and rate-limit them in way we find best for our application. In my case I ended up with the code below.

Object used for throttling:

DEFAULT_BURST_WINDOW = datetime.timedelta(seconds=5) DEFAULT_WAIT_WINDOW = datetime.timedelta(seconds=15)   class BurstThrottle(object):     max_hits = None     hits = None     burst_window = None     total_window = None     timestamp = None      def __init__(self, max_hits, burst_window, wait_window):         self.max_hits = max_hits         self.hits = 0         self.burst_window = burst_window         self.total_window = burst_window + wait_window         self.timestamp = datetime.datetime.min      def throttle(self):         now = datetime.datetime.utcnow()         if now < self.timestamp + self.total_window:             if (now < self.timestamp + self.burst_window) and (self.hits < self.max_hits):                 self.hits += 1                 return datetime.timedelta(0)             else:                 return self.timestamp + self.total_window - now         else:             self.timestamp = now             self.hits = 1             return datetime.timedelta(0)

HTTP adapter:

class MyHttpAdapter(requests.adapters.HTTPAdapter):     throttle = None      def __init__(self, pool_connections=requests.adapters.DEFAULT_POOLSIZE,                  pool_maxsize=requests.adapters.DEFAULT_POOLSIZE, max_retries=requests.adapters.DEFAULT_RETRIES,                  pool_block=requests.adapters.DEFAULT_POOLBLOCK, burst_window=DEFAULT_BURST_WINDOW,                  wait_window=DEFAULT_WAIT_WINDOW):         self.throttle = BurstThrottle(pool_maxsize, burst_window, wait_window)         super(MyHttpAdapter, self).__init__(pool_connections=pool_connections, pool_maxsize=pool_maxsize,                                             max_retries=max_retries, pool_block=pool_block)      def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):         request_successful = False         response = None         while not request_successful:             wait_time = self.throttle.throttle()             while wait_time > datetime.timedelta(0):                 gevent.sleep(wait_time.total_seconds(), ref=True)                 wait_time = self.throttle.throttle()              response = super(MyHttpAdapter, self).send(request, stream=stream, timeout=timeout,                                                        verify=verify, cert=cert, proxies=proxies)              if response.status_code != 429:                 request_successful = True          return response

Setup:

requests_adapter = adapter.MyHttpAdapter(     pool_connections=__CONCURRENT_LIMIT__,     pool_maxsize=__CONCURRENT_LIMIT__,     max_retries=0,     pool_block=False,     burst_window=datetime.timedelta(seconds=5),     wait_window=datetime.timedelta(seconds=20))  requests_session = requests.session() requests_session.mount('http://', requests_adapter) requests_session.mount('https://', requests_adapter)  unsent_requests = (grequests.get(url,                                  hooks={'response': handle_response},                                  session=requests_session) for url in urls) grequests.map(unsent_requests, size=__CONCURRENT_LIMIT__)

145

answered Oct 04 '22 00:10

Bartłomiej Siwek

Take a look at this for automatic requests throttling: https://pypi.python.org/pypi/RequestsThrottler/0.2.2

You can set both a fixed amount of delay between each request or set a number of requests to send in a fixed amount of seconds (which is basically the same thing):

import requests from requests_throttler import BaseThrottler  request = requests.Request(method='GET', url='http://www.google.com') reqs = [request for i in range(0, 5)]  # An example list of requests with BaseThrottler(name='base-throttler', delay=1.5) as bt:     throttled_requests = bt.multi_submit(reqs)

where the function multi_submit returns a list of ThrottledRequest (see doc: link at the end).

You can then access to the responses:

for tr in throttled_requests:     print tr.response

Alternatively you can achieve the same by specifying the number or requests to send in a fixed amount of time (e.g. 15 requests every 60 seconds):

import requests from requests_throttler import BaseThrottler  request = requests.Request(method='GET', url='http://www.google.com') reqs = [request for i in range(0, 5)]  # An example list of requests with BaseThrottler(name='base-throttler', reqs_over_time=(15, 60)) as bt:     throttled_requests = bt.multi_submit(reqs)

Both solutions can be implemented without the usage of the with statement:

import requests from requests_throttler import BaseThrottler  request = requests.Request(method='GET', url='http://www.google.com') reqs = [request for i in range(0, 5)]  # An example list of requests bt = BaseThrottler(name='base-throttler', delay=1.5) bt.start() throttled_requests = bt.multi_submit(reqs) bt.shutdown()

For more details: http://pythonhosted.org/RequestsThrottler/index.html

answered Oct 03 '22 23:10

se7entyse7en

Related questions
                            
                                How to access a sharepoint site via the REST API in Python?
                            
                                python: Is there a downside to using faulthandler?
                            
                                Sorting the order of bars in pandas/matplotlib bar plots
                            
                                AttributeError: '_io.TextIOWrapper' object has no attribute 'next' python
                            
                                Sort by column within multi index level in pandas
                            
                                using pandas.read_csv to read certain columns
                            
                                How to save Scikit-Learn-Keras Model into a Persistence File (pickle/hd5/json/yaml)
                            
                                InvalidArgumentError: cannot compute MatMul as input #0(zero-based) was expected to be a float tensor but is a double tensor [Op:MatMul]
                            
                                Overriding 'to boolean' operator in python?
                            
                                How to know the encoding of a file in Python? [duplicate]
                            
                                Permission to view, but not to change! - Django
                            
                                paramiko Incompatible ssh peer (no acceptable kex algorithm)
                            
                                Read slave, read-write master setup
                            
                                How to get list of objects with unique attribute
                            
                                How to access List elements
                            
                                How to launch EC2 instance with Boto, specifying size of EBS?
                            
                                itertools.accumulate() versus functools.reduce()
                            
                                How to show multiple images in one figure?
                            
                                matplotlib hatched fill_between without edges?
                            
                                Python modules with submodules and functions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Limiting/throttling the rate of HTTP requests in GRequests

Tags:

python

http

python-requests

throttling

rate-limiting

Bartłomiej Siwek

People also ask

2 Answers

Bartłomiej Siwek

se7entyse7en

Recent Activity

Donate For Us