Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract info within a #shadow-root (open) using Selenium Python?

I got the next url related to an online store https://www.tiendasjumbo.co/buscar?q=mani and I can't extract the product label an another fields:

from selenium import webdriver
import time
from random import randint

driver = webdriver.Firefox(executable_path= "C:\Program Files (x86)\geckodriver.exe")
driver.implicitly_wait(10)
time.sleep(4)

url =  "https://www.tiendasjumbo.co/buscar?q=mani"
driver.maximize_window()
driver.get(url)
driver.find_element_by_xpath('//h1[@class="impulse-title"]')

What am I doing wrong, I also tried to switch the iframes but there is no way to achieve my goal? any help is welcome.enter image description here

like image 657
Alexis AG Avatar asked Nov 27 '20 23:11

Alexis AG


People also ask

How do I extract text from the middle of a cell in Excel?

The Excel MID function extracts a given number of characters from the middle of a supplied text string. For example, =MID("apple",2,3) returns "ppl". The Excel LEN function returns the length of a given text string as the number of characters.

How do I extract only part of a cell in Excel?

Depending on where you want to start extraction, use one of these formulas: LEFT function - to extract a substring from the left. RIGHT function - to extract text from the right. MID function - to extract a substring from the middle of a text string, starting at the point you specify.

How do I extract text between two delimiters in Excel?

The easiest way to extract a substring between two delimiters is to use the text to column feature in Excel, especially if you have multiple delimiters. In this example, use =MID(A2, SEARCH(“-“,A2) + 1, SEARCH(“-“,A2,SEARCH(“-“,A2)+1) – SEARCH(“-“,A2) – 1) in cell B2 and drag it to the entire data range.


1 Answers

The products within the website https://www.tiendasjumbo.co/buscar?q=mani are located within a #shadow-root (open).

impulse-search


Solution

To extract the product label you have to use shadowRoot.querySelector() and you can use the following Locator Strategy:

  • Code Block:

    driver.get('https://www.tiendasjumbo.co/buscar?q=mani')
    item = driver.execute_script("return document.querySelector('impulse-search').shadowRoot.querySelector('div.group-name-brand h1.impulse-title span.formatted-text')")
    print(item.text)
    
  • Console Output:

    La especial mezcla de nueces, maní, almendras y marañones x 450 g
    

References

You can find a couple of relevant discussions in:

  • Unable to locate the Sign In element within #shadow-root (open) using Selenium and Python
  • How to locate the First name field within shadow-root (open) within the website https://www.virustotal.com using Selenium and Python

Microsoft Edge and Google Chrome version 96

Chrome v96 has changed the shadow root return values for Selenium. Some helpful links:

  • Java - full example on GitHub
  • Shadow DOM in Selenium
  • Python - full example on GitHub
  • Shadow DOM and Selenium with Chromium 96
  • C# - full example on GitHub
  • Shadow DOM in Ruby Selenium
  • Ruby - full example on GitHub
like image 105
undetected Selenium Avatar answered Oct 19 '22 10:10

undetected Selenium