Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to give delay between each requests in scrapy?

I don't want to crawl simultaneously and get blocked. I would like to send one request per second.

like image 609
nizam.sp Avatar asked Jan 07 '12 08:01

nizam.sp


People also ask

Is Scrapy asynchronous?

Scrapy is written with Twisted, a popular event-driven networking framework for Python. Thus, it's implemented using a non-blocking (aka asynchronous) code for concurrency.

What is concurrent request in Scrapy?

Basically it is a daemon that listens to requests for spiders to run. Scrapyd runs spiders in multiple processes, you can control the behavior with max_proc and max-proc-per-cpu settings: max_proc. The maximum number of concurrent Scrapy process that will be started.

What is AutoThrottle in Scrapy?

This is an extension for automatically throttling crawling speed based on load of both the Scrapy server and the website you are crawling.

Is Scrapy good for web scraping?

Scrapy, being one of the most popular web scraping frameworks, is a great choice if you want to learn how to scrape data from the web. In this tutorial, you'll learn how to get started with Scrapy and you'll also implement an example project to scrape an e-commerce website.


1 Answers

There is a setting for that:

DOWNLOAD_DELAY

Default: 0

The amount of time (in secs) that the downloader should wait before downloading consecutive pages from the same website. This can be used to throttle the crawling speed to avoid hitting servers too hard.

DOWNLOAD_DELAY = 0.25    # 250 ms of delay 

Read the docs: https://doc.scrapy.org/en/latest/index.html

like image 88
warvariuc Avatar answered Sep 17 '22 17:09

warvariuc