Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop through links using Selenium Webdriver (Python)

Afternoon all. Currently trying to use Selenium webdriver to loop through a list of links on a page. Specifically, it's clicking a link, grabbing a line of text off said page to write to a file, going back, and clicking the next link in a list. The following is what I have:

    def test_text_saver(self):
    driver = self.driver
    textsave = open("textsave.txt","w")
    list_of_links = driver.find_elements_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li")
    """Initializing Link Count:"""
    link_count = len(list_of_links)
    while x <= link_count:
        print x
        driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li["+str(x)+"]/a").click()
        text = driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text
        textsave.write(text+"\n\n")
        driver.implicitly_wait(5000)
        driver.back()
        x += 1
    textsave.close()

When run, it goes to the initial page, and...goes back to the main page, rather than the subpage that it's supposed to. Printing x, I can see it's incrementing three times rather than one. It also crashes after that. I've checked all my xpaths and such, and also confirmed that it's getting the correct count for the number of links in the list.

Any input's hugely appreciated--this is really just to flex my python/automation, since I'm just getting into both. Thanks in advance!!

like image 484
TRoch Avatar asked Feb 04 '26 03:02

TRoch


1 Answers

I'm not sure if this will fix the problem, but in general it is better to use WebDriverWait rather than implicitly_wait since WebDriveWait.until will keep calling the supplied function (e.g. driver.find_element_by_xpath) until the returned value is not False-ish or the timeout (e.g 5000 seconds) is reached -- at which point it raises a selenium.common.execptions.TimeoutException.

import selenium.webdriver.support.ui as UI

def test_text_saver(self):
    driver = self.driver
    wait = UI.WebDriverWait(driver, 5000)
    with open("textsave.txt","w") as textsave:
        list_of_links = driver.find_elements_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li/a")
        for link in list_of_links:  # 2
            link.click()   # 1
            text = wait.until(
                lambda driver: driver.find_element_by_xpath("//*[@id=\"learn-sub\"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text)
            textsave.write(text+"\n\n")
            driver.back()
  1. After you click the link, you should wait until the linked url is loaded. So the call to wait.until is placed directly after link.click()
  2. Instead of using

    while x <= link_count:
        ...
        x += 1
    

    it is better to use

    for link in list_of_links: 
    

    For one think, it improves readability. Moreover, you really don't need to care about the number x, all you really care about is looping over the links, which is what the for-loop does.

like image 195
unutbu Avatar answered Feb 05 '26 17:02

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!