Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting texts from <li> items with selenium in Python

I´m trying to get the text inside a /a tag in a nested ul-li structure. I locate all the "li", but can´t get the text inside a's.

I´m using Python 3.7 and Selenium webdriver with Firefox driver.

The corresponding HTML is:

[some HTML]

<ul class="dropdown-menu inner">
<!---->
    <li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option first-in-group group-item">
        <span class="dropdown-header">Cursos em Destaque</span>
        <a tabindex="0">Important TEXT 1</a>
    </li>
    <!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
    <li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
        <span class="dropdown-header">Cursos em Destaque</span>
        <a tabindex="0">Important TEXT 2</a>
    </li>
    <!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
    <li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
        <span class="dropdown-header">Cursos em Destaque</span>
        <a tabindex="0">Important TEXT 3</a>
    </li>
    <!-- end nyaBsOption: curso in ctrl.cursos group by curso.grupo -->
    <li nya-bs-option="curso in ctrl.cursos group by curso.grupo" class="nya-bs-option group-item">
        <span class="dropdown-header">Cursos em Destaque</span>
        <a tabindex="0">Important TEXT4</a>
    </li>
                            [another 100 <li></li> similar blocks]                  .
                                                .
    <li class="no-search-result" placeholder="Curso">
        <span>Unimportant TEXT</span>
    </li>
</ul>

[more HTML]

I´ve tried the code below:

cursos = browser.find_elements_by_xpath('//li[@nya-bs-option="curso in ctrl.cursos group by curso.grupo"]')
nome_curso = [curso.find_element_by_tag_name('a').text for curso in cursos]

I get the list with the correct number of items, but all of them = ''. Can anyone help me? Thks.

like image 752
Gerson Bronstein Avatar asked Jun 28 '26 07:06

Gerson Bronstein


1 Answers

Seems you were close. To extract the texts, e.g. Important TEXT 1, Important TEXT 2, Important TEXT 3, Important TEXT4, etc you have to induce WebDriverWait for the desired visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute() method:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.dropdown-menu.inner li.nya-bs-option a")))])
    
  • Using XPATH and text attribute:

    print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='dropdown-menu inner']//li[contains(@class, 'nya-bs-option')]//a")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the title attribute through Selenium using Python?


Outro

As per the documentation:

  • get_attribute() method Gets the given attribute or property of the element.
  • text attribute returns The text of the element.
  • Difference between text and innerHTML using Selenium
like image 102
undetected Selenium Avatar answered Jun 30 '26 23:06

undetected Selenium