To use a specific example, I want to download the binary release of Hadoop 2.7.2. The web site points to http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz which then points to the closest mirror by location. For me that is http://xenia.sote.hu/ftp/mirrors/www.apache.org/hadoop/core/hadoop-2.7.2/hadoop-2.7.2.tar.gz.
I want to actually download this in a shell script (a Dockerfile to be specific). I would prefer to use a location-agnostic URL for the download so that if someone runs the script on the other end of the world, they would not use the same mirror.
Is there a URL I could use with wget
or curl
that dynamically redirects to the closest mirror? What would that URL be for this specific file?
Wget can find all these files automatically and download them into the same directory structure as the website, which would essentially give you an offline version of that site. Include the -m (mirror) flag in your wget command and the URL of the site you want to mirror.
The most basic command you can execute with wget is just supplying the URL of the file you wish to download. Wget will download the specified file to whatever location you are running the command from.
In order to download a file using Wget, type wgetfollowed by the URL of the file that you wish to download. Wget will download the file in the given URL and save it in the current directory. Let’s download a minified version of jQuery using the following command:
The wget command is an internet file downloader that can download anything from files and web pages all the way through to entire websites. The wget command is in the format of:
The source code of closer.lua actually states that the action
and filename
query parameters may be used to produce a redirect to the requested file on the automatically chosen mirror instead of the usual HTML mirror selection page.
So you can download the files directly via this URL: https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz:
GET /dyn/mirrors/mirrors.cgi?action=download&filename=hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: www.apache.org
HTTP/1.1 302 Found
Cache-Control: max-age=3600
Connection: Keep-Alive
Content-Length: 0
Date: Mon, 13 Mar 2017 18:08:00 GMT
Expires: Mon, 13 Mar 2017 19:08:00 GMT
Keep-Alive: timeout=30, max=100
Location: http://ftp-stud.hs-esslingen.de/pub/Mirrors/ftp.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
Server: Apache/2.4.7 (Ubuntu)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With