Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deriving text from Javascript webpage using Selenium

I am trying to extract the text "This station managed by the Delta Flow Projects Office", from this website: https://waterdata.usgs.gov/ca/nwis/uv?site_no=381504121404001. This line is located under the div class stationContainer. Since this is a dynamic webpage, I'm using selenium to derive the html.

This is the html from the website.

img

This is my code:

from selenium import webdriver
from selenium.webdriver.common.by import By

browser = webdriver.Chrome()
url = "https://waterdata.usgs.gov/ca/nwis/uv?site_no=381504121404001"
browser.get(url) #navigate to the page
innerHTML = browser.execute_script("return document.body.innerHTML")
elem = browser.find_elements_by_xpath("//div[@class='stationContainer']")

print (elem)

I get the this result from my print message:

selenium.webdriver.remote.webelement.WebElement (session="96fc124c0e2d1fd4cd86f61db272d52a", element="0.5862443940581294-1")

I'm hoping to derive the text by searching through the div class, but it seems I'm not going about this the right way.

like image 550
saoirse Avatar asked Jun 30 '26 01:06

saoirse


1 Answers

elem is a list not a string. Try this:

elem = browser.find_elements_by_xpath("//div[@class='stationContainer']")[0]
print elem.text

That prints out all the content. So you probably need a better selector or a way to parse the rest of it out.

like image 143
JavaKungFu Avatar answered Jul 02 '26 14:07

JavaKungFu