I am looking to scrape data from this site's mma data and parsing a few highcharts tables. I am clicking a link with selenium and then switching to the chart. I go to this site and click on +420 in the Artem Lobov row for the Pinnacle column. This creates a pop out chart. Then I switch to the active element. I would like to capture the graph drawn by highcharts in response to the click.
I use selenium in the following manner:
actions = ActionChains(driver)
actions.move_to_element(driver.find_element_by_id(pin_id))
actions.click()
actions.perform()
time.sleep(3)
driver.switch_to_active_element()
I was able to click the link and get the chart but I am a bit lost on how highcharts works.
I am trying to parse highcharts-series-group here
and get the values in the chart.
I believe the data can be found by:
soup = bs4.BeautifulSoup(open(driver.page_source), "lxml")
data = soup.find_all('g', {"class":"highcharts-series-group"})[-1].find_all("path")
However this provides the following and it it is not clear how a chart is created from the data. As noted in the comments, it appears to be svg.
During inspection the data appears to be in <g class="highcharts-series"
and <g class="highcharts-series-tracker
but its not clear highcharts graphs it from this data.
How does highcharts display the graph from data saved? Is there a clean way to get the data from the highcharts-series-group as displayed?
I could not figure out how to convert SVG data into what is displayed on the graph you mentioned, but wrote the following Selenium Python script:
from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('https://www.bestfightodds.com/events/ufc-fight-night-108-swanson-vs-lobov-1258')
actions = webdriver.ActionChains(driver)
actions.move_to_element(driver.find_element_by_id('oID1013467091'))
actions.click()
actions.perform()
time.sleep(3)
driver.switch_to_active_element()
chart_number = driver.find_element_by_id('chart-area').get_attribute('data-highcharts-chart')
chart_data = driver.execute_script('return Highcharts.charts[' + chart_number + '].series[0].options.data')
for point in chart_data:
e = driver.execute_script('return oneDecToML('+ str(point.get('y')) + ')')
print(point.get('x'), e)
Here we are using Highcharts API and some js from the page sources, that converts server response for this chart to what we see on a graph.
Reconstructing data from the svg data list described above using the linear equation y = mx + b
from the highcharts chart is another method. If actual data values are known, and datapoints are often displayed on highcharts charts, the slope can be calculated very accurately. Given the intercept is known (see below) I ran a regression on 3 known points and it calculated them precisely (zero error).
Another method described in detail here is reconstructing the data from the highcharts-yaxis-labels
but the suitability depends on the data and required accuracy. Extract the y
and text
values as x
and y
respectively and run a regression analysis.
y="148"... >-125<
y="117"... >+100<
y="85"... >+120<
y="54"... >+140<
y="23"... >+160<
It is useful to plot the values in a chart, especially with this case because the relationship is not linear. Fortunately discarding the -125
value gives a nice straight line and none of the values are less than 100
.
x y
117 100
85 120
54 140
23 160
x -0.638938504720592
R^2 0.999938759887726
The bottom x
is the line slope so m
= -0.638938504720592
.
What about the intercept? The most common coordinate system has a bottom left origin but svg uses a top left coordinate system. This means the intercept will have to be adjusted to the top of the chart. The easiest way given this dataset has a value for the top of the chart is to just use the top y
as b
= 160
.
Extract the data list using your preferred method (not described in this answer) and reconstruct the data with the linear equation.
eg ...L 999999 101 ....
y
= -0.638938504720592
* 101
+ 160
= 95
Reconstructing the data from the y-axis may not be as accurate as using the actual data. If you are lucky the yaxis-labels
scale will have a nice scale so you get precise values but it can be up to half a unit out on the top and bottom of the range, so (1/2 + 1/2) / 94 = 1.06% in this example but the error is likely much less.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With