About scrapy's concurrency model

Question

Now I plan to use scrapy in a more distributed approach, and I'm not sure if the spiders/pipelines/downloaders/schedulers and engine are all hosted in separate processes or threads, could anyone share some info about this? and could we change the process/thread count for each component? I know now there are two settings "CONCURRENT_REQUESTS" and "CONCURRENT_ITEMS", they will determine the concurrent threads for downloaders and pipelines, right? and if I want to deploy spiders/ pipelines/downloaders in different machines, I need to serialize the items/requests/responses, right? Appreciate very much for your helps!!

Thanks, Edward.

escitalopram · Accepted Answer

Scrapy is single threaded. It uses the Reactor pattern to achieve concurrent network requests. This is done using the Twisted Framework.

People wanting to distribute Scrapy usually try to implement some messaging framework. Some use Redis, some others try RabbitMQ

Also have a look at Scrapyd

About scrapy's concurrency model

Tags:

scrapy

user1441208

1 Answers

escitalopram

Recent Activity

Donate For Us

About scrapy's concurrency model

Tags:

scrapy

user1441208

1 Answers

escitalopram

Related questions

Recent Activity

Donate For Us