Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use all CPU cores for Scrapy

Tags:

My scrapy program only uses one CPU core no matter how CONCURRENT_REQUESTS i do. If or not some methods in scrapy can use all the cpu core just in one scrapy crawler?

ps: it seems have arguement max_proc to use in early edition, but i cannot find it now.

like image 751
pyfreyr Avatar asked Jul 10 '17 02:07

pyfreyr


1 Answers

Scrapy does not use multiple CPUs.

This is by design. Usually the bottleneck of Scrapy is not the CPU, but the network input/output. So, even using a single CPU, Scrapy can be more efficient that a synchronous framework or library (e.g. requests) used in combination with multiprocessing.

If CPU is a bottleneck in your case, you should consider having a separate, multiprocessing-enabled process handle the CPU-heavy parts.

If you still want to run Scrapy spiders in multiple processes, see Running Scrapy from a script. You can combine that with Python’s multiprocessing module. Or, better yet, using Scrapyd or one of the alternatives.

like image 98
Gallaecio Avatar answered Sep 29 '22 20:09

Gallaecio