Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing HTML5 data-* attribute values with Selenium in Python

I am parsing a JS generated webpage like so:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('https://www.consumerbarometer.com/en/graph-builder/?question=M1&filter=country:singapore,canada,mexico,brazil,argentina,united_states,bulgaria,austria,belgium,croatia,czech_republic,denmark,estonia,finland,france,germany,greece,hungary,italy,ireland,latvia,lithuania,norway,netherlands,poland,portugal,russia,romania,serbia,slovakia,spain,slovenia,sweden,switzerland,ukraine,united_kingdom,australia,china,israel,hong_kong_sar,japan,korea,new_zealand,malaysia,taiwan,turkey,vietnam')

// wait for svg to appear
WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.TAG_NAME, 'svg')))

for text in driver.find_elements_by_class_name('bar-text-label'):
    print(text.text)

driver.close()

Besides getting the text from the class bar-text-label I would also like to get values from an HTML5 data-attribute. For example,<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="76" class="bar"></rect> and I would like to be able to parse 76 from this.

Is this possible to do in Selenium?

I tried both of the below, with no sucess:

for text in driver.find_elements_by_class_name('bar'): 
    print(data_value.text)

for data in driver.find_elements_by_xpath('//*[contains(@data-value)]/@data-value'): 
    print(data.text)
like image 242
metersk Avatar asked Feb 04 '15 16:02

metersk


People also ask

How do you fetch an attribute value of an element in Selenium?

The getAttribute() method fetches the text contained by an attribute in an HTML document. It returns the value of the HTML element's attribute as a string. If a value is not set for an attribute, it will return a NULL value. For attributes with Boolean values, getAttribute() will return either "True" or NULL.

How do I scrape HREF in Python using Selenium?

We can fetch href links in a page in Selenium by using the method find_elements(). All the links in the webpage are designed in a html document such that they are enclosed within the anchor tag. To fetch all the elements having <anchor> tagname, we shall use the method find_elements_by_tag_name().


1 Answers

If you have elements like the following:

<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="75" class="bar">bar1</rect>
<rect rx="3" ry="3" width="76%" height="40" transform="translate(0,40)" data-value="76" class="bar">bar2</rect>

You can get the text value and the attribute value as follows:

elements = driver.find_elements_by_class_name('bar')
for element in elements:
    print element.text
    print element.get_attribute('data-value')

This prints out:

bar1
75
bar2
76
like image 195
Jessamyn Smith Avatar answered Sep 30 '22 19:09

Jessamyn Smith