Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is Scrapy single-threaded or multi-threaded?

There are few concurrency settings in Scrapy, like CONCURRENT_REQUESTS. Does it mean, that Scrapy crawler is multi-threaded? So if I run scrapy crawl my_crawler it will literally fire multiple simultaneous requests in parallel? Im asking because, I've read that Scrapy is single-threaded.

like image 272
Gill Bates Avatar asked Jul 15 '14 14:07

Gill Bates


People also ask

Is Python single threaded or multithreaded?

Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

Is numpy multithreaded?

First, numpy supports multithreading, and this can give you a speed boost in multicore environments!

Does OpenCV use multithreading?

Multi-threaded capture and display. The following example implemented in tutorial-grabber-opencv-threaded. cpp shows how to implement a multi-threaded application, where image capture is executed in one thread and image display in an other one. The capture is here performed thanks to OpenCV cv::VideoCapture class.

Is multithreading better than single threading?

Advantages of Multithreaded Processes All the threads of a process share its resources such as memory, data, files etc. A single application can have different threads within the same address space using resource sharing. It is more economical to use threads as they share the process resources.


2 Answers

Scrapy is single-threaded, except the interactive shell and some tests, see source.

It's built on top of Twisted, which is single-threaded too, and makes use of it's own asynchronous concurrency capabilities, such as twisted.internet.interfaces.IReactorThreads.callFromThread, see source.

like image 181
famousgarkin Avatar answered Oct 18 '22 05:10

famousgarkin


Scrapy does most of it's work synchronously. However, the handling of requests is done asynchronously.

I suggest this page if you haven't already seen it.

http://doc.scrapy.org/en/latest/topics/architecture.html

edit: I realize now the question was about threading and not necessarily whether it's asynchronous or not. That link would still be a good read though :)

regarding your question about CONCURRENT_REQUESTS. This setting changes the number of requests that twisted will defer at once. Once that many requests have been started it will wait for some of them to finish before starting more.

like image 42
rocktheartsm4l Avatar answered Oct 18 '22 05:10

rocktheartsm4l