Trouble parsing tabular items from a graph located in a website

Question

I'm trying to extract the tabular contents available on a graph in a webpage. The content of those tables are only visible when someone hovers his cursor within the area. One such table is this one.

Webpage address

The graph within which the tables are is titled as EPS consensus revisions : last 18 months.

I've tried so far with:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = "https://www.marketscreener.com/SUNCORP-GROUP-LTD-6491453/revisions/"

driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 10)
for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#graphRevisionBNAeec span > table tr"))):
    data = [item.text for item in items.find_elements_by_css_selector("td")]
    print(data)
driver.quit()

When I run the above script, It throws thie error raise TimeoutException(message, screen, stacktrace):selenium.common.exceptions.TimeoutException: Message: pointing at this for items in wait.until() line.

Output from a single table out of many should look like:

Period: Thursday, Aug 22, 2019
Number of upgrading estimates: 0
Number of unchanged estimates: 7
Number of Downgrading estimates: 0
High Value: 0.90 AUD
Mean Value: 0.85 AUD
Low Value: 0.77 AUD

How can I get the content of those tables from that graph?

EDIT: I'm still expecting any solution based purely on any browser simulator.

kmaork · Accepted Answer

You'll be much better off querying the website's backend directly than using selenium to scrape the frontend for three important reasons:

Speed: Using the API directly is much, much faster and efficient because it only fetches the data you need and doesn't have to wait for javascript to run or pixels to render, and there is no overhead of running a webdriver.
Stability: usually changes to the frontend are much more frequent and hard to follow than changes to the backend. If your code relies on the site's frontend it will probably stop working pretty quickly when they make some UI changes.
Accuracy: sometimes the data displayed in the UI is inaccurate or incomplete. For example, in this website, all numbers are rounded to two decimal points, while the backend sometime provides data more than twice as accurate.

Here's how you could easily use the backend API:

import requests
# API url found using chrome devtools
url = 'https://www.marketscreener.com/charting/afDataFeed.php?codeZB=6491453&t=eec&sub_t=bna&iLang=2'
# We are mocking a chrome browser because the API is blocking python requests apparently
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
# Make a request to the API and parse the JSON response
data = requests.get(url, headers=headers).json()[0]
# A function to find data for a specific date
def get_vals(date):
    vals = []
    for items in data:
        for item in items:
            if item['t'] == date:
                vals.append(item['y'])
                break
    return vals
# Use the function above with the example table given in the question
print(get_vals('Thursday, Aug 22, 2019'))

Running this outputs the list [0.9, 0.84678, 0.76628, 0, 7, 0], which as you can see is the data you wanted to extract from the table you gave as an example.

frianH · Answer

Try change this locator:

By.CSS_SELECTOR, "#graphRevisionBNAeec span > table tr"

With this:

By.XPATH, "//*[@class='tabElemNoBor overfH']"

I get a console printed like this:

[u'EPS consensus revisions : last 18 months', u'EPS consensus revisions : last 18 months', u'Number of Estimates
EPS 2020(AUD)
Number of upgrading estimates
High Value
Number of unchanged estimates
Mean Value
Number of downgrading estimates
Low Value
Mar 18
Apr 18
May 18
Jun 18
Jul 18
Aug 18
Sep 18
Oct 18
Nov 18
Dec 18
Jan 19
Feb 19
Mar 19
Apr 19
May 19
Jun 19
Jul 19
Aug 19
Sep 19
Oct 19
0
2
4
6
8
10
12
0.2
0.4
0.6
0.8
1
1.2
1.4
\xa9marketscreener.com - S&P Global Market Intelligence']

brfh · Answer

This is solution using selenium (I tested my code with Firefox, but it work fine whith Chrome):

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Firefox()
actions = ActionChains(driver)

driver.get("https://www.marketscreener.com/SUNCORP-GROUP-LTD-6491453/revisions/")

table = driver.find_element_by_xpath("//table[@class = 'tabElemNoBor overfH']") #if you want other table, change the XPath
actions.move_to_element(table).perform()

date= WebDriverWait(driver,60).until(EC.presence_of_element_located((By.XPATH, "//table[@class = 'tabElemNoBor overfH']//div[@class = 'highcharts-label highcharts-tooltip highcharts-color-undefined']/span/span//b"))).text
data = WebDriverWait(driver,60).until(EC.presence_of_all_elements_located((By.XPATH, "//table[@class = 'tabElemNoBor overfH']//div[@class = 'highcharts-label highcharts-tooltip highcharts-color-undefined']//td")))
data = [item.get_attribute("innerHTML") for item in data]
data_1 = [data[i] for i in range(len(data)) if i%2==0]
data_2 = [data[i][3:data[i].find("&")] for i in range(len(data)) if i%2==1]
data = list(zip(data_1, data_2))
print(date)
for i in data:
     print(i[0], i[1])

I just trigger the table to generate html code of info table. If you want to change the date, just use mouse move method.

Trouble parsing tabular items from a graph located in a website

Tags:

python

python-3.x

selenium

selenium-webdriver

web-scraping

MITHU

3 Answers

kmaork

frianH

brfh

Recent Activity

Donate For Us

Trouble parsing tabular items from a graph located in a website

Tags:

python

python-3.x

selenium

selenium-webdriver

web-scraping

MITHU

3 Answers

kmaork

frianH

brfh

Related questions

Recent Activity

Donate For Us