My scrapy program only uses one CPU core no matter how CONCURRENT_REQUESTS
i do. If or not some methods in scrapy can use all the cpu core just in one scrapy crawler?
ps: it seems have arguement max_proc
to use in early edition, but i cannot find it now.
Scrapy does not use multiple CPUs.
This is by design. Usually the bottleneck of Scrapy is not the CPU, but the network input/output. So, even using a single CPU, Scrapy can be more efficient that a synchronous framework or library (e.g. requests) used in combination with multiprocessing
.
If CPU is a bottleneck in your case, you should consider having a separate, multiprocessing-enabled process handle the CPU-heavy parts.
If you still want to run Scrapy spiders in multiple processes, see Running Scrapy from a script. You can combine that with Python’s multiprocessing module. Or, better yet, using Scrapyd or one of the alternatives.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With