It looks like the website is blocking direct access from Curl.
library(XML)
library(RCurl)
theurl <- "http://www.london2012.com/medals/medal-count/"
page <- getURL(theurl)
page # fail
[1] "<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don't have permission to access \"http://www.london2012.com/medals/medal-count/\" on this server.<P>\nReference #18.358a503f.1343590091.c056ae2\n</BODY>\n</HTML>\n"
Let's try to see if we can access it directly from the Table.
page <- readHTMLTable(theurl)
No luck there Error in htmlParse(doc) : error in creating parser for http://www.london2012.com/medals/medal-count/
How would you go about getting this table into R?
Update: in response to comments and toying, faking a user agent string worked to get the content. But readHTMLtable returns an error.
page <- getURLContent(theurl, useragent="Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20120716 Firefox/15.0a2")
It looks like this works:
rr <- readHTMLTable(page,header=FALSE)
rr2 <- setNames(rr[[1]],
c("rank","country","gold","silver","bronze","junk","total"))
rr3 <- subset(rr2,select=-junk)
## oops, numbers all got turned into factors ...
tmpf <- function(x) { as.numeric(as.character(x)) }
rr3[,-2] <- sapply(rr3[,-2],tmpf)
head(rr3)
## rank country gold silver bronze total
## 1 1 People's Republic of China 6 4 2 12
## 2 2 United States of America 3 5 3 11
## 3 3 Italy 2 3 2 7
## 4 4 Republic of Korea 2 1 2 5
## 5 5 France 2 1 1 4
## 6 6 Democratic People's Republic of Korea 2 0 1 3
with(rr3,dotchart(total,country))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With