I am looking for a way to automate the process of downloading satellite imagery. The screenshot shows the type and format of files I am interested in downloading (.ntf and the 150MB files).
I encountered the following code from TheBioBucket that looks promising, although the R package XML is obsolete.
require(XML)
dir.create("D:/GIS_DataBase/DEM/")
setwd("D:/GIS_DataBase/DEM/")
doc <- htmlParse("http://www.viewfinderpanoramas.org/dem3.html#alps")
urls <- paste0("http://www.viewfinderpanoramas.org", xpathSApply(doc,'//*/a[contains(@href,"/dem1/N4")]/@href'))
names <- gsub(".*dem1/(\\w+\\.zip)", "\\1", urls)
for (i in 1:length(urls)) download.file(urls[i], names[i])
Is there a good way to automate the process of downloading .ntf files programmatically using R or Python?
Scraping is definitely easy to implement in Python.
# collect.py
import urllib, urllib2, bs4
from urlparse import urljoin
soup = bs4.BeautifulSoup(urllib2.urlopen("http://www.viewfinderpanoramas.org/dem3.html#alps"))
links = soup.find_all('a')
for link in links:
try:
if "/dem1/N4" in link['href']:
url = urljoin("http://www.viewfinderpanoramas.org/", link['href'])
filename = link['href'].split('/')[-1]
urllib.urlretrieve(url, filename)
#break
except:
pass
You might want to change the filename to include path where you want to put the file
In R
the XML
package can facilitate what you need fairly easily. Here's a place to start
library(XML)
demdir <- "http://www.viewfinderpanoramas.org/dem1/"
# this returns a data.frame with file names
dems <- readHTMLTable(demdir)[[1]]
# you'll want, for example, to download only zip files
demnames <- dems[grepl(".zip",dems$Name),"Name"]
# (but you can add other subsetting/selection operations here)
# download the files to the same name locally
# (change '.' if you want some other directory)
sapply(demnames, function(demfi) download.file(paste0(demdir,demfi), file.path(".",demfi)))
The only complication I can see is if the filename is too long (if it's truncated in your web-browser), then the filename in dems
will also be truncated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With