I'm trying to load some publicly available NHS data using R and the XML package but I keep getting the following error message:
Error: failed to load external entity "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
I can't seem to figure out what might be causing this despite looking through a few related question.
Here is my very simple code:
library("XML")
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- htmlParse(url)
Edit: Session Information
R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit)
locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252
attached base packages: [1] stats graphics grDevices utils
datasets methods baseloaded via a namespace (and not attached): [1] tools_3.0.1
You can also use rvest
& the xml2
packages:
library(rvest) # github version
library(xml2) # github version
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- read_html(url)
doc %>%
html_nodes("a[href^='http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/']") %>%
html_attr("href")
## [1] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-overnight/"
## [2] "http://www.england.nhs.uk/statistics/bed-availability-and-occupancy/bed-data-day-only/"
Package XML has some issues. The problem is intermitent and has nothing to do with the URL. I solved the problem using the function GET of httr package in order to obtain the html code, then passed it to htmlParse, see below:
library("XML")
library(httr)
url <- "http://www.england.nhs.uk/statistics/statistical-work-areas/bed-availability-and-occupancy/"
doc <- htmlParse(rawToChar(GET(url)$content))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With