Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Don't wait for a page to load using Selenium in Python

Tags:

How do I make selenium click on elements and scrape data before the page has fully loaded? My internet connection is quite terrible so it sometimes takes forever to load the page entirely, is there anyway around this?

like image 295
no nein Avatar asked Sep 20 '17 12:09

no nein


2 Answers

ChromeDriver 77.0 (which supports Chrome version 77) now supports eager as pageLoadStrategy.

Resolved issue 1902: Support eager page load strategy [Pri-2]


As you question mentions of click on elements and scrape data before the page has fully loaded in this case we can take help of an attribute pageLoadStrategy. When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal. Selenium can start executing the next line of code from different Document readiness state. Currently Selenium supports 3 different Document readiness state which we can configure through the pageLoadStrategy as follows:

  1. none (undefined)
  2. eager (page becomes interactive)
  3. normal (complete page load)

Here is the code block to configure the pageLoadStrategy:

from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities  binary = r'C:\Program Files\Mozilla Firefox\firefox.exe' caps = DesiredCapabilities().FIREFOX # caps["pageLoadStrategy"] = "normal"  #  complete caps["pageLoadStrategy"] = "eager"  #  interactive # caps["pageLoadStrategy"] = "none"   #  undefined driver = webdriver.Firefox(capabilities=caps, firefox_binary=binary, executable_path="C:\\Utility\\BrowserDrivers\\geckodriver.exe") driver.get("https://google.com") 
like image 187
undetected Selenium Avatar answered Sep 18 '22 20:09

undetected Selenium


For Chromedriver it works the same as in @DebanjanB's answer, however the 'eager' page load strategy is not yet supported

So for chromedriver you get:

from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities  caps = DesiredCapabilities().CHROME # caps["pageLoadStrategy"] = "normal"  #  Waits for full page load caps["pageLoadStrategy"] = "none"   # Do not wait for full page load driver = webdriver.Chrome(desired_capabilities=caps, executable_path="path/to/chromedriver.exe")  

Note that when using the 'none' strategy you most likely have to implement your own wait method to check if the element you need is loaded.

from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as ec  WebDriverWait(driver, timeout=10).until(     ec.visibility_of_element_located((By.ID, "your_element_id")) )  

Now you can start interacting with your element before the page is fully loaded!

like image 36
Camiel Kerkhofs Avatar answered Sep 18 '22 20:09

Camiel Kerkhofs