Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download file with R given a JavaScript Statement

I want to create an R script that, among other things, downloads baseball player projection data from http://www.fangraphs.com/projections.aspx?pos=all&stats=bat&type=zips. There is a link to export this data to .csv on the webpage near the top right corner of the data table but it appears to be a JavaScript command (javascript:__doPostBack('ProjectionBoard1$cmdCSV',''). I am familiar with using download.file() using a link to a .csv file but am not sure how to approach this.

How can I use R to extract this data?

like image 602
user3271783 Avatar asked Jul 21 '14 02:07

user3271783


People also ask

How do you create a download link in JavaScript?

Creating the download linkCreate an object URL for the blob object. Create an anchor element ( <a></a> ) Set the href attribute of the anchor element to the created object URL. Set the download attribute to the filename of the file to be downloaded.


1 Answers

The donwload isn't a simple response that can be easily retrieved with download.file. The web page constructs a FORM with some huge parameters that store the state of the web page, then pass this (and a load of cookies too) to the server to get the CSV response.

To make this work in R (or any other programming language) you need to construct that response, which you can usually only do by first getting the web page, scraping the FORM parameters (and cookies), then constructing the precise POST request you did when you clicked on the link.

This might be possible with RCurl, and it can sometimes be easier if you have a browser that can save the POST request parameter from its developer tools so you can then get RCurl to read them.

Another common technique in web scraping is to essentially run a browser that can be automated by a scripting language. There's an R package that leverages Selenium that might be able to do this:

http://cran.r-project.org/web/packages/RSelenium/index.html

There are some related (but not duplicate) Q's here, such as:

How to use R to download a zipped file from a SSL page that requires cookies

An R-help posting from a couple of years ago has some suggestions too:

https://stat.ethz.ch/pipermail/r-help//2012-September/335769.html

like image 102
Spacedman Avatar answered Sep 20 '22 03:09

Spacedman