Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to refresh Selenium Webdriver DOM data without reloading page?

I use Selenium with Python to parse search results from a database site. Search output is dynamic, so, when I type new request, page is not reloaded, but search results are new.

Problem is that Selenium doesn't update WebDriver DOM data, so next time I try something like driver.find_elements_by_class_name('query_header') I receive elements from previous search request and StaleError.

Using WebDriverWait(driver, timeout).until(element_present) doesn't help. Elements are there (all search result blocks have same classes, names, etc.)., but they're old :)

I fixed it by reloading page with driver.refresh() after each request, but it looks a bit unnatural + double requests.

Is there any way to refresh Selenium DOM data, so I'll get new elements with find_elements without page reload?

like image 693
sortas Avatar asked Jan 14 '18 03:01

sortas


2 Answers

Without knowing the content of the page, it's hard to craft a solution to your problem.

When your Selenium code selects elements from the webdriver, it does so on the page as it's loaded when your selector code executes, meaning that the page does not need to be reloaded in order to retrieve new elements. Instead, it seems like your problem is that the elements don't exist on the page yet, meaning it's possible that the search results hadn't loaded when your selector attempted to get a fresh copy of the elements.


A simple solution would be to increase the wait time between starting the search and selecting the search results, to give time for the page to load the search results

from selenium import webdriver
import time

# Load page
driver = webdriver.Firefox()
driver.get('https://www.example.com')

# Begin search
driver.find_element_by_tag_name('a').click()

# Wait for search results to load
time.sleep(5)

# Retrieve search results
results = driver.find_elements_by_class_name('result')

Downsides of this would be it's really dependent on network QoS and how long the search query takes to execute on your page.


A more complex but canonical solution would be to wait for the page to load the search results, perhaps by checking for an Ajax search loading icon or seeing if the results changed. A good place to start would be to look at WebDriverWait's in Selenium.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

# Load page
driver = webdriver.Firefox()
driver.get('https://www.example.com')

# Begin search
driver.find_element_by_tag_name('a').click()

# Wait for search results to load
WebDriverWait(driver, 30).until(
    expected_conditions.invisibility_of_element_located((By.ID, 'ajax_loader'))
)

# Retrieve search results
results = driver.find_elements_by_class_name('result')

The downfall of this method is that it may take a lot of time to figure out how to get it working, and it needs to be customized for each page you want to wait for updates on.

You mentioned that this method seems not to work for you. A suggestion for that would be (if it doesn't break the page) to manipulate the DOM pre-search to clear any existing results or elements matching your selector before waiting for the new results to load. This should fix problems with your Selenium WebDriverWait when waiting for the presence of elements matching the selector for your search results.

driver.execute_script("el = document.getElementById('#results');el.parentElement.removeChild(el)")

Additionally, since you mentioned that the page shouldn't reload, it may be that your page is using Ajax to load search results then modifying the DOM with JavaScript. It may be useful to inspect the network traffic (most browsers' DevTools should have a "Network" tab) and try to reverse engineer how the website is sending the search query and parsing the data.

import requests

# Search term (birds)
term = 'ja'

# Send request
request = requests.get('https://jqueryui.com/resources/demos/autocomplete/search.php?term=' + term)

# Print response
print(request.json())

This may violate certain sites' TOS or policies (actually any of these methods might), so watch out for that, and it may at first be difficult to find out how to send and parse requests on a lower level than what's loaded on the DOM after the page loads the search results more traditionally. On the plus side, this is probably the best (performance, reliability) way to get search results, assuming that an Ajax-like search was used.

like image 90
andrewgu Avatar answered Sep 22 '22 23:09

andrewgu


You simply need to ask the driver to take the element once more, reusing the same snippet :

var X = driver.findElement( By.xpath("myxpath") ); //suppose element A is returned
//...do things
// the dom is reloaded
//copy paste the same command again :
var Y = driver.findElement( By.xpath("myxpath") ); //element B shall be returned after the dom has been updated.

Then Y will be the new object changed by the dom reload whereas its description is the very same!

like image 43
Bastien Gallienne Avatar answered Sep 24 '22 23:09

Bastien Gallienne