Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Selenium - get href value

I am trying to copy the href value from a website, and the html code looks like this:

<p class="sc-eYdvao kvdWiq">  <a href="https://www.iproperty.com.my/property/setia-eco-park/sale-   1653165/">Shah Alam Setia Eco Park, Setia Eco Park  </a> </p> 

I've tried driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") but it returned 'list' object has no attribute 'get_attribute'. Using driver.find_element_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href") returned None. But i cant use xpath because the website has like 20+ href which i need to copy all. Using xpath would only copy one.

If it helps, all the 20+ href are categorised under the same class which is sc-eYdvao kvdWiq.

Ultimately i would want to copy all the 20+ href and export them out to a csv file.

Appreciate any help possible.

like image 291
Eric Choi Avatar asked Feb 25 '19 08:02

Eric Choi


People also ask

How does Selenium calculate href value?

We can fetch href links in a page in Selenium by using the method find_elements(). All the links in the webpage are designed in a html document such that they are enclosed within the anchor tag. To fetch all the elements having <anchor> tagname, we shall use the method find_elements_by_tag_name().

How do you find the href in Python?

To get href with Python BeautifulSoup, we can use the find_all method. to create soup object with BeautifulSoup class called with the html string. Then we find the a elements with the href attribute returned by calling find_all with 'a' and href set to True .

How do you find the href of an element?

Use the querySelector() method to get an element by an href attribute, e.g. document. querySelector('a[href="https://example.com"]') . The method returns the first element that matches the selector or null if no element with the provided selector exists in the DOM.


2 Answers

You want driver.find_elements if more than one element. This will return a list. For the css selector you want to ensure you are selecting for those classes that have a child href

elems = driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq [href]") links = [elem.get_attribute('href') for elem in elems] 

You might also need a wait condition for presence of all elements located by css selector.

elems = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".sc-eYdvao.kvdWiq [href]"))) 
like image 88
QHarr Avatar answered Sep 16 '22 15:09

QHarr


As per the given HTML:

<p class="sc-eYdvao kvdWiq">     <a href="https://www.iproperty.com.my/property/setia-eco-park/sale-1653165/">Shah Alam Setia Eco Park, Setia Eco Park</a> </p> 

As the href attribute is within the <a> tag ideally you need to move deeper till the <a> node. So to extract the value of the href attribute you can use either of the following Locator Strategies:

  • Using css_selector:

    print(driver.find_element_by_css_selector("p.sc-eYdvao.kvdWiq > a").get_attribute('href')) 
  • Using xpath:

    print(driver.find_element_by_xpath("//p[@class='sc-eYdvao kvdWiq']/a").get_attribute('href')) 

If you want to extract all the values of the href attribute you need to use find_elements* instead:

  • Using css_selector:

    print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_css_selector("p.sc-eYdvao.kvdWiq > a")]) 
  • Using xpath:

    print([my_elem.get_attribute("href") for my_elem in driver.find_elements_by_xpath("//p[@class='sc-eYdvao kvdWiq']/a")]) 

Dynamic elements

However, if you observe the values of class attributes i.e. sc-eYdvao and kvdWiq ideally those are dynamic values. So to extract the href attribute you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a"))).get_attribute('href')) 
  • Using XPATH:

    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//p[@class='sc-eYdvao kvdWiq']/a"))).get_attribute('href')) 

If you want to extract all the values of the href attribute you can use visibility_of_all_elements_located() instead:

  • Using CSS_SELECTOR:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "p.sc-eYdvao.kvdWiq > a")))]) 
  • Using XPATH:

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//p[@class='sc-eYdvao kvdWiq']/a")))]) 

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait      from selenium.webdriver.common.by import By      from selenium.webdriver.support import expected_conditions as EC 
like image 37
undetected Selenium Avatar answered Sep 17 '22 15:09

undetected Selenium