Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading file from the right Apache Mirror with wget

Tags:

url

download

To use a specific example, I want to download the binary release of Hadoop 2.7.2. The web site points to http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz which then points to the closest mirror by location. For me that is http://xenia.sote.hu/ftp/mirrors/www.apache.org/hadoop/core/hadoop-2.7.2/hadoop-2.7.2.tar.gz.

I want to actually download this in a shell script (a Dockerfile to be specific). I would prefer to use a location-agnostic URL for the download so that if someone runs the script on the other end of the world, they would not use the same mirror.

Is there a URL I could use with wget or curl that dynamically redirects to the closest mirror? What would that URL be for this specific file?

like image 358
Daniel Darabos Avatar asked Jul 26 '16 14:07

Daniel Darabos


People also ask

How do I Mirror a website with Wget?

Wget can find all these files automatically and download them into the same directory structure as the website, which would essentially give you an offline version of that site. Include the -m (mirror) flag in your wget command and the URL of the site you want to mirror.

How do I use Wget to download a file?

The most basic command you can execute with wget is just supplying the URL of the file you wish to download. Wget will download the specified file to whatever location you are running the command from.

How to download jQuery minified version using Wget?

In order to download a file using Wget, type wgetfollowed by the URL of the file that you wish to download. Wget will download the file in the given URL and save it in the current directory. Let’s download a minified version of jQuery using the following command:

What is the Wget command?

The wget command is an internet file downloader that can download anything from files and web pages all the way through to entire websites. The wget command is in the format of:


1 Answers

The source code of closer.lua actually states that the action and filename query parameters may be used to produce a redirect to the requested file on the automatically chosen mirror instead of the usual HTML mirror selection page.

So you can download the files directly via this URL: https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz:

GET /dyn/mirrors/mirrors.cgi?action=download&filename=hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: www.apache.org



HTTP/1.1 302 Found
Cache-Control: max-age=3600
Connection: Keep-Alive
Content-Length: 0
Date: Mon, 13 Mar 2017 18:08:00 GMT
Expires: Mon, 13 Mar 2017 19:08:00 GMT
Keep-Alive: timeout=30, max=100
Location: http://ftp-stud.hs-esslingen.de/pub/Mirrors/ftp.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
Server: Apache/2.4.7 (Ubuntu)
like image 151
Tom Avatar answered Oct 20 '22 09:10

Tom