My question is in R how to download all the files on a website? I know how to do it one by one but not all at one time. For example:
http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/
I tested this on a small subset (3) of the 56 files on the page, and it works fine.
## your base url
url <- "http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/"
## query the url to get all the file names ending in '.zip'
zips <- XML::getHTMLLinks(
url,
xpQuery = "//a/@href['.zip'=substring(., string-length(.) - 3)]"
)
## create a new directory 'myzips' to hold the downloads
dir.create("myzips")
## save the current directory path for later
wd <- getwd()
## change working directory for the download
setwd("myzips")
## create all the new files
file.create(zips)
## download them all
lapply(paste0(url, zips), function(x) download.file(x, basename(x)))
## reset working directory to original
setwd(wd)
Now all the zip files are in the directory myzips
and are ready for further processing. As an alternative to lapply()
you could also use a for()
loop.
## download them all
for(u in paste0(url, zips)) download.file(u, basename(u))
And of course, setting quiet = TRUE
may be nice since we're downloading 56 files.
Slightly different approach.
library(rvest)
library(httr)
library(pbapply)
library(stringi)
URL <- "http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/"
pg <- read_html(URL)
zips <- grep("zip$", html_attr(html_nodes(pg, "a[href^='TAB']"), "href"), value=TRUE)
invisible(pbsapply(zips, function(zip_file) {
GET(URL %s+% zip_file, write_disk(zip_file))
}))
You get a progress bar with this and built-in "caching" (write_disk
won't overwrite already downloaded files).
You can weave in Richard's excellent code for dir creation & file checking as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With