Most questions about Python and Selenium scraping a web page's table data involve a table with an ID or Class, and some index technique using a count of rows and columns. The Xpath technique is usually not explained either.
Say I have a table without an element ID or class, let's use this one for example.
I want to return the value 'Johnson', without counting row or column numbers.
Here's my attempt (edited)...
import selenium.webdriver as webdriver
import contextlib
url = 'http://www.w3schools.com/html/html_tables.asp'
with contextlib.closing(webdriver.Firefox()) as driver:
driver.get(url)
columnref = 3
rowref = 4
xpathstr = '//tr[position()=' + str(rowref) + ']//td[position()=' + str(columnref) + ']'
data = driver.find_element_by_xpath(xpathstr).text
print data
I have gotten some good help here already, but am still using an index. I need to generate 'columnref' and 'rowref' by looking up their values. 'Last Name', and '3' respectively.
Just use this css selector to reach the cell you want tbody > tr:nth-child(4) > td:nth-child(3), and you can generate css selector for any cell with the same way. See below:
>>> driver.find_element_by_css_selector("tbody > tr:nth-child(4) > td:nth-child(3)")
<selenium.webdriver.remote.webelement.WebElement object at 0x10fdd4510>
>>> driver.find_element_by_css_selector("tbody > tr:nth-child(4) > td:nth-child(3)").text
u'Johnson'
Alternatively, you can use position() tag to locate cell position. See below:
>>> driver.find_element_by_xpath("//tr[position()=4]//td[position()= 3]").text
u'Johnson'
>>> driver.find_element_by_xpath("//tr[position()=5]//td[position()= 3]").text
u'Smith'
If you want to get the text by column name and row number you can write a function that returns the value by finding the index of the column then getting the text as below:
def get_text_column_row(table_css, header, row):
table = driver.find_element_by_css_selector(table_css)
table_headers = table.find_elements_by_css_selector('tbody > tr:nth-child(1) > th')
table_rows = table.find_elements_by_css_selector("tbody > tr > td:nth-child(1)")
index_of_column = None
index_of_row = None
for i in range(len(table_headers)):
if table_headers[i].text == header:
index_of_column = i + 1
for i in range(len(table_rows)):
if table_rows[i].text == row:
index_of_row = i + 1
xpath = '//tr[position() = %d]//td[position() = %d]' %(index_of_row, index_of_column)
return driver.find_element_by_xpath(xpath).text
and use it like below:
>>> get_text_column_row('#main > table:nth-child(6)', 'Points', '3')
u'80'
>>> get_text_column_row('#main > table:nth-child(6)', 'Last Name', '3')
u'Doe'
>>> get_text_column_row('#main > table:nth-child(6)', 'Last Name', '4')
u'Johnson'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With