Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download All Files From a Folder on a Website

Tags:

r

download

My question is in R how to download all the files on a website? I know how to do it one by one but not all at one time. For example:

http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/

like image 619
victoria Avatar asked Nov 18 '15 20:11

victoria


2 Answers

I tested this on a small subset (3) of the 56 files on the page, and it works fine.

## your base url
url <- "http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/"
## query the url to get all the file names ending in '.zip'
zips <- XML::getHTMLLinks(
    url, 
    xpQuery = "//a/@href['.zip'=substring(., string-length(.) - 3)]"
)
## create a new directory 'myzips' to hold the downloads
dir.create("myzips")
## save the current directory path for later
wd <- getwd()
## change working directory for the download
setwd("myzips")
## create all the new files
file.create(zips)
## download them all
lapply(paste0(url, zips), function(x) download.file(x, basename(x)))
## reset working directory to original
setwd(wd)

Now all the zip files are in the directory myzips and are ready for further processing. As an alternative to lapply() you could also use a for() loop.

## download them all
for(u in paste0(url, zips)) download.file(u, basename(u))

And of course, setting quiet = TRUE may be nice since we're downloading 56 files.

like image 152
Rich Scriven Avatar answered Oct 05 '22 05:10

Rich Scriven


Slightly different approach.

library(rvest)
library(httr)
library(pbapply)
library(stringi)

URL <- "http://www2.census.gov/geo/docs/maps-data/data/rel/t00t10/"

pg <- read_html(URL)
zips <- grep("zip$", html_attr(html_nodes(pg, "a[href^='TAB']"), "href"), value=TRUE)

invisible(pbsapply(zips, function(zip_file) {
  GET(URL %s+% zip_file, write_disk(zip_file))
}))

You get a progress bar with this and built-in "caching" (write_disk won't overwrite already downloaded files).

You can weave in Richard's excellent code for dir creation & file checking as well.

like image 41
hrbrmstr Avatar answered Oct 05 '22 05:10

hrbrmstr