Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting svg from Highcharts data into data points

I am looking to scrape data from this site's mma data and parsing a few highcharts tables. I am clicking a link with selenium and then switching to the chart. I go to this site and click on +420 in the Artem Lobov row for the Pinnacle column. This creates a pop out chart. Then I switch to the active element. I would like to capture the graph drawn by highcharts in response to the click.

I use selenium in the following manner:

actions = ActionChains(driver)
actions.move_to_element(driver.find_element_by_id(pin_id))
actions.click()
actions.perform()
time.sleep(3)
driver.switch_to_active_element()

I was able to click the link and get the chart but I am a bit lost on how highcharts works.
I am trying to parse highcharts-series-group here and get the values in the chart.

I believe the data can be found by:

soup = bs4.BeautifulSoup(open(driver.page_source), "lxml")
data = soup.find_all('g', {"class":"highcharts-series-group"})[-1].find_all("path")

However this provides the following and it it is not clear how a chart is created from the data. As noted in the comments, it appears to be svg.

During inspection the data appears to be in <g class="highcharts-series" and <g class="highcharts-series-tracker but its not clear highcharts graphs it from this data.

How does highcharts display the graph from data saved? Is there a clean way to get the data from the highcharts-series-group as displayed?

like image 517
Michael WS Avatar asked May 01 '17 22:05

Michael WS


2 Answers

I could not figure out how to convert SVG data into what is displayed on the graph you mentioned, but wrote the following Selenium Python script:

from selenium import webdriver
import time

driver = webdriver.Chrome()
driver.get('https://www.bestfightodds.com/events/ufc-fight-night-108-swanson-vs-lobov-1258')
actions = webdriver.ActionChains(driver)
actions.move_to_element(driver.find_element_by_id('oID1013467091'))
actions.click()
actions.perform()
time.sleep(3)
driver.switch_to_active_element()
chart_number = driver.find_element_by_id('chart-area').get_attribute('data-highcharts-chart')
chart_data = driver.execute_script('return Highcharts.charts[' + chart_number + '].series[0].options.data')
for point in chart_data:
    e = driver.execute_script('return oneDecToML('+ str(point.get('y')) + ')')
    print(point.get('x'), e)

Here we are using Highcharts API and some js from the page sources, that converts server response for this chart to what we see on a graph.

like image 126
arcquim Avatar answered Sep 28 '22 15:09

arcquim


Reconstructing data from the svg data list described above using the linear equation y = mx + b from the highcharts chart is another method. If actual data values are known, and datapoints are often displayed on highcharts charts, the slope can be calculated very accurately. Given the intercept is known (see below) I ran a regression on 3 known points and it calculated them precisely (zero error).

Another method described in detail here is reconstructing the data from the highcharts-yaxis-labels but the suitability depends on the data and required accuracy. Extract the y and text values as x and y respectively and run a regression analysis.

y="148"... >-125<
y="117"... >+100<
y="85"... >+120<
y="54"... >+140<
y="23"... >+160<

It is useful to plot the values in a chart, especially with this case because the relationship is not linear. Fortunately discarding the -125 value gives a nice straight line and none of the values are less than 100.

x   y
117 100
85  120
54  140
23  160

x           -0.638938504720592
R^2         0.999938759887726

The bottom x is the line slope so m= -0.638938504720592.

What about the intercept? The most common coordinate system has a bottom left origin but svg uses a top left coordinate system. This means the intercept will have to be adjusted to the top of the chart. The easiest way given this dataset has a value for the top of the chart is to just use the top y as b = 160.

Extract the data list using your preferred method (not described in this answer) and reconstruct the data with the linear equation.

eg ...L 999999 101 ....

y = -0.638938504720592 * 101 + 160 = 95

Reconstructing the data from the y-axis may not be as accurate as using the actual data. If you are lucky the yaxis-labels scale will have a nice scale so you get precise values but it can be up to half a unit out on the top and bottom of the range, so (1/2 + 1/2) / 94 = 1.06% in this example but the error is likely much less.

like image 30
flywire Avatar answered Sep 28 '22 15:09

flywire