Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-threading in selenium python

I am working on a project which needs bit automation and web-scraping for which I am using Selenium and BeautifulSoup (python2.7).

I want to open only one instance of a web browser and login to a website, keeping that session, I am trying to open new tabs which will be independently controlled by threads, each thread controlling a tab and performing their own task. How should I do it? An example code would be nice. Well here's my code:

def threadFunc(driver, tabId):
    if tabId == 1:
        #open a new tab and do something in it
    elif tabId == 2:
        #open another new tab with some different link and perform some task
    .... #other cases


class tabThreads(threading.Thread):

    def __init__(self, driver, tabId):
        threading.Thread.__init__(self)
        self.tabID = tabId
        self.driver = driver

    def run(self):
        print "Executing tab ", self.tabID
        threadFunc(self.driver, self.tabID)

def func():
    # Created a main window
    
    driver = webdriver.Firefox()
    driver.get("...someLink...")

    # This is the part where i am stuck, whether to create threads and send
    # them the same web-driver to stick with the current session by using the
    # javascript call "window.open('')" or use a separate for each tab to
    # operate on individual pages, but that will open a new browser instance
    # everytime a driver is created

    thread1 = tabThreads(driver, 1)
    thread2 = tabThreads(driver, 2)
    ...... #other threads

I am open to suggestions for using any other module, if needed

like image 302
SUMEET DEWANGAN Avatar asked Dec 28 '16 02:12

SUMEET DEWANGAN


People also ask

Can you do multithreading in selenium?

Selenium can use multi−threading in one browser with the help of TestNG framework. TestNG provides the feature of parallel execution which works on the concept of Java multi−threading. To execute tests based on various parameters, the TestNG has an XML file where we have the configurations.

Is multi threading possible in Python?

Python doesn't support multi-threading because Python on the Cpython interpreter does not support true multi-core execution via multithreading. However, Python does have a threading library. The GIL does not prevent threading.

Is Python single or multi threaded?

Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

Can Python threads run on multiple cores?

Python threads cannot take advantage of many cores. This is due to an internal implementation detail called the GIL (global interpreter lock) in the C implementation of python (cPython) which is almost certainly what you use.


2 Answers

I you are using the script to automatically submit forms (simply said doing GET and POST requests), I would recommend you to look at requests. You can easily capture Post requests from your Browser (Network tab in Developer Pane on both Firefox and Chrome), and submit them. Something like:

session = requests.session()
response = session.get('https://stackoverflow.com/')
soup = BeautifulSoup(response.text)

and even POST data like:

postdata = {'username':'John','password':password}
response=session.post('example.com',data=postdata,allow_redirects=True)

It can be easily threaded, Multiple times faster than using selenium, the only problem is there is no JavaScript or Form support, so you need to do it the old fashioned way.

EDIT: Also take a look at ThreadPoolExecutor

like image 179
Anunay Avatar answered Oct 03 '22 17:10

Anunay


My understanding is that Selenium drivers are not thread-safe. In the WebDriver spec, the Thread Safety section is empty...which I take to mean they have not addressed the topic at all. https://www.w3.org/TR/2012/WD-webdriver-20120710/#thread-safety

So while you could share the driver reference with multiple threads and make calls to the driver from multiple threads, there is no guarantee that the driver will be able to handle multiple asynchronous calls correctly.

Instead, you must either synchronize calls from multiple threads to ensure one is completed before the next starts, or you should have just one thread making Selenium API calls...potentially handling commands from a queue that is filled by multiple other threads.

Also, see Can Selenium use multi threading in one browser?

like image 34
CMerrill Avatar answered Oct 03 '22 15:10

CMerrill