Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading Live Olympic Medal Data into R

Tags:

r

It looks like the website is blocking direct access from Curl.

library(XML) 
library(RCurl) 
theurl <- "http://www.london2012.com/medals/medal-count/"
page <- getURL(theurl)

page # fail
[1] "<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don't have permission to access \"http&#58;&#47;&#47;www&#46;london2012&#46;com&#47;medals&#47;medal&#45;count&#47;\" on this server.<P>\nReference&#32;&#35;18&#46;358a503f&#46;1343590091&#46;c056ae2\n</BODY>\n</HTML>\n"

Let's try to see if we can access it directly from the Table.

page <- readHTMLTable(theurl)

No luck there Error in htmlParse(doc) : error in creating parser for http://www.london2012.com/medals/medal-count/

How would you go about getting this table into R?


Update: in response to comments and toying, faking a user agent string worked to get the content. But readHTMLtable returns an error.

page <- getURLContent(theurl, useragent="Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2")
like image 400
Brandon Bertelsen Avatar asked Jul 29 '12 19:07

Brandon Bertelsen


1 Answers

It looks like this works:

rr <- readHTMLTable(page,header=FALSE)
rr2 <- setNames(rr[[1]],
                c("rank","country","gold","silver","bronze","junk","total"))
rr3 <- subset(rr2,select=-junk)
## oops, numbers all got turned into factors ...
tmpf <- function(x) { as.numeric(as.character(x)) }
rr3[,-2] <- sapply(rr3[,-2],tmpf)               
head(rr3)
##   rank                                country gold silver bronze total
## 1    1             People's Republic of China    6      4      2    12
## 2    2               United States of America    3      5      3    11
## 3    3                                  Italy    2      3      2     7
## 4    4                      Republic of Korea    2      1      2     5
## 5    5                                 France    2      1      1     4
## 6    6 Democratic People's Republic  of Korea    2      0      1     3
with(rr3,dotchart(total,country))
like image 64
Ben Bolker Avatar answered Oct 26 '22 10:10

Ben Bolker