Scraping data from Highcharts using selenium

Question

I am trying to scrape data from highchart. I took a look at similar questions, but didn't understand how script_execute works or how could I detect js using my browser. Here is my current code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

# Core settings
chrome_path = r"C:\Users\X\Y\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.implicitly_wait(15)

stats_url = 'https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/'

driver.get(stats_url)
driver.find_element_by_link_text('by Source').click()
driver.find_element_by_id('custom-date-range').click()
year = driver.find_element_by_id('date-range-start')
year.click()
for i in range(5): # goes back 5 years
    year.send_keys(Keys.ARROW_DOWN)
driver.find_element_by_id('date-range-submit').click()

I want to scrape the "download" data from the graph, (not only for this page for many pages though). And when I use custom search option, csv file that automatically generated by the website is not updated. So only way is to scrape the data from the graph. How I could do it ?

Florent B. · Accepted Answer

Mozilla provides a simple REST API to get the stats, so you don't need to use Selenium.

With the requests module:

url = "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20170823-20171023.json"
data = requests.get(url).json()

To select the range, simply update the dates in the URL.

But if you are still willing to scrap the chart with selenium:

dates = driver.execute_script("return Highcharts.charts[0].series[0].xData");
users = driver.execute_script("return Highcharts.charts[0].series[0].yData");
downloads = driver.execute_script("return Highcharts.charts[0].series[1].yData");

Davide Patti · Answer

I noticed one thing.

It seems true that:

"when I use custom search option, csv file that automatically generated by the website is not updated".

But actually it is not true. It is updated, but the maximum "custom data range" seems to be 1 year.

For example, if you set from 2013-09-23 to 2017-10-23 the .csv(.json) generated has max the data of 1 year (in this example from 22/10/2016 to 21/10/2017).

You can better notice this if you play with the "extremes".

For example with:

https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131023-20141023.json

first element: {"date": "2014-10-23", "count": 212730, "end": "2014-10-23"}
last element: {"date": "2013-10-24", "count": 163094, "end": "2013-10-24"}

if you change with:

https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131023-20141024.json

first element: {"date": "2014-10-24", "count": 215105, "end": "2014-10-24"}
last element: {"date": "2013-10-25", "count": 168018, "end": "2013-10-25"}

Or with:

https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131022-20141023.json

will be again :

first element: {"date": "2014-10-23", "count": 212730, "end": "2014-10-23"}
last element: {"date": "2013-10-24", "count": 163094, "end": "2013-10-24"}

So, in order to get the data of the last 5 years you could do:

import subprocess
interestedYears=5;
year=1
today="2017-10-23"
tokenDataToday= today.split("-")
dateEnd=tokenDataToday[0]+tokenDataToday[1]+tokenDataToday[2]
url= "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-"

while year <= interestedYears:
     yearStart= str(int(float(tokenDataToday[0]))-year)
     dateStart=yearStart+tokenDataToday[1]+tokenDataToday[2]
     #print("dateStart: " + dateStart)
     #print("dateEnd: " + dateEnd)
     tmpUrl=url+dateStart+"-"+dateEnd+".csv"
     cmd = 'curl -O ' + tmpUrl
     print(cmd)
     args = cmd.split()
     process = subprocess.Popen(args, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     stdout, stderr = process.communicate()
     dateEnd=dateStart
     year = year+1
     print("-----------------------------")

Scraping data from Highcharts using selenium

Tags:

python

selenium

highcharts

edyvedy13

Video Answer

2 Answers

Florent B.

Davide Patti

Recent Activity

Donate For Us

Scraping data from Highcharts using selenium

Tags:

python

selenium

highcharts

edyvedy13

Video Answer

2 Answers

Florent B.

Davide Patti

Related questions

Recent Activity

Donate For Us