I'm confused about parallel execution in python using selenium. There seems to be a few ways to go about it, but some seem out of date.
There's a python module called python-wd-parallel
which seems to have some functionality to do this, but it's from 2013, is this still useful now? I also found this example.
There's concurrent.futures
, this seems a lot newer, but not so easy to implement. Anyone have a working example with parallel execution in selenium?
There's also using just threads and executors to get the job done, but I feel this will be slower, because it's not using all the cores and is still running in serial formation.
What is the latest way to do parallel execution using selenium?
Selenium offers a suite of tools that enables organizations to integrate test automation into the CI/CD pipeline and run Selenium parallel execution on various devices and browsers running on different platforms.
Parallel Testing using TestNG and Selenium Methods: Helps run methods in separate threads. Tests: Help to run all methods belonging to the same tag in the same thread. Classes: Helps to run all methods belonging to a class in a single thread. Instances: Helps run all methods in the same instance in the same thread.
unittest-parallel is a parallel unit test runner for Python with coverage support. By default, unittest-parallel runs unit tests on all CPU cores available. To run your unit tests with coverage, add either the "--coverage" option (for line coverage) or the "--coverage-branch" for line and branch coverage.
Use joblib's Parallel module to do that, its a great library for parallel execution.
Lets say we have a list of urls named urls
and we want to take a screenshot of each one in parallel
First lets import the necessary libraries
from selenium import webdriver
from joblib import Parallel, delayed
Now lets define a function that takes a screenshot as base64
def take_screenshot(url):
phantom = webdriver.PhantomJS('/path/to/phantomjs')
phantom.get(url)
screenshot = phantom.get_screenshot_as_base64()
phantom.close()
return screenshot
Now to execute that in parallel what you would do is
screenshots = Parallel(n_jobs=-1)(delayed(take_screenshot)(url) for url in urls)
When this line will finish executing, you will have in screenshots
all of the data from all of the processes that ran.
Explanation about Parallel
Parallel(n_jobs=-1)
means use all of the resources you candelayed(function)(input)
is joblib
's way of creating the input for the function you are trying to run on parallelMore information can be found on the joblib
docs
threads
and processes
will give you a considerable speed up on your selenium code.Short examples are given bellow. The selenium work is done by selenium_title
function that return the page title. That don't deal with exceptions happening during each thread/process execution. For that look Long Answer - Dealing with exceptions.
concurrent.futures.ThreadPoolExecutor
.from selenium import webdriver
from concurrent import futures
def selenium_title(url):
wdriver = webdriver.Chrome() # chrome webdriver
wdriver.get(url)
title = wdriver.title
wdriver.quit()
return title
links = ["https://www.amazon.com", "https://www.google.com"]
with futures.ThreadPoolExecutor() as executor: # default/optimized number of threads
titles = list(executor.map(selenium_title, links))
concurrent.futures.ProcessPoolExecutor
. Just need to replace ThreadPoolExecuter
by ProcessPoolExecutor
in the code above. They are both derived from the Executor
base class. Also you must protect the main, like below.if __name__ == '__main__':
with futures.ProcessPoolExecutor() as executor: # default/optimized number of processes
titles = list(executor.map(selenium_title, links))
Threads
with Python GIL works?Even tough Python has limitations on threads due the Python GIL and even though threads will be context switched. Performance gain will come due to implementation details of Selenium. Selenium works by sending commands like POST
, GET
(HTTP requests
). Those are sent to the browser driver server. Consequently you might already know I/O bound tasks (HTTP requests
) releases the GIL, so the performance gain.
We can make small modifications on the example above to deal with Exceptions
on the threads spawned. Instead of using executor.map
we use executor.submit
. That will return the title wrapped on Future
instances.
To access the returned title we can use future_titles[index].result
where index size len(links)
, or simple use a for
like bellow.
with futures.ThreadPoolExecutor() as executor:
future_titles = [ executor.submit(selenium_title, link) for link in links ]
for future_title, link in zip(future_titles, links):
try:
title = future_title.result() # can use `timeout` to wait max seconds for each thread
except Exception as exc: # this thread migh have had an exception
print('url {:0} generated an exception: {:1}'.format(link, exc))
Note that besides iterating over future_titles
we iterate over links
so in case an Exception
in some thread we know which url(link)
was responsible for that.
The futures.Future
class are cool because they give you control on the results received from each thread. Like if it completed correctly or there was an exception and others, more about here.
Also important to mention is that futures.as_completed
is better if you don´t care which order the threads return items. But since the syntax to control exceptions with that is a little ugly I omitted it here.
First why I've been always using threads for speeding up my selenium code:
Process
) or Threads (Threads
). Here also reach similar conclusions about Python threads vs processes on I/O bound tasks.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With