Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrency and Selenium - Multiprocessing vs Multithreading

I have a script that uses a lot of headless Selenium automation and looped HTTP requests. It's very important that I implement a threading/worker queue for this script. I've done that.

My question is: Should I be using multi-thread or multi-process? Thread or ProcessPool? I know that:

"If your program spends more time waiting on file reads or network requests or any type of I/O task, then it is an I/O bottleneck and you should be looking at using threads to speed it up."

and...

"If your program spends more time in CPU based tasks over large datasets then it is a CPU bottleneck. In this scenario you may be better off using multiple processes in order to speed up your program. I say may as it’s possible that a single-threaded Python program may be faster for CPU bound problems, it can depend on unknown factors such as the size of the problem set and so on."

Which is the case when it comes to Selenium? Am I right to think that all CPU-bound tasks related to Selenium will be executed separately via the web driver or would my script benefit from multiple processes?

Or to be more concise: When I thread Selenium in my script, is the web driver limited to 1 CPU core, the same core the script threads are running on?

like image 304
xendi Avatar asked Oct 02 '18 23:10

xendi


People also ask

Which is better multithreading or multiprocessing?

Multiprocessing is used to create a more reliable system, whereas multithreading is used to create threads that run parallel to each other. multithreading is quick to create and requires few resources, whereas multiprocessing requires a significant amount of time and specific resources to create.

What is the main difference between multithreading and multiprocessing?

By formal definition, multithreading refers to the ability of a processor to execute multiple threads concurrently, where each thread runs a process. Whereas multiprocessing refers to the ability of a system to run multiple processors concurrently, where each processor can run one or more threads.

Which is faster multiprocessing or multithreading?

For most problems, multithreading is probably significantly faster than using multiple processes, but as soon as you encounter hardware limitations, that answer goes out the window.

Does multithreading increase concurrency?

No. Multithreading can cause concurrency but they are not the same thing. Multithreading means multiple thread doing different things simultaneously so that app efficiency is increased.


1 Answers

Web driver is just a driver, a driver cannot drive a car without a car.

For example when you use ChromeDriver to communicate with browser, you are launching Chrome. And ChromeDriver itself does no calculation but Chrome does.

So to clarify, webdriver is a tool to manipulate browser but itself is not a browser.

Based on this, definitely you should choose thread pool instead of process pool as it is surely an I/O bound problem in your python script.

like image 53
Sraw Avatar answered Sep 30 '22 03:09

Sraw