Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download all .tar.gz files from website/directory using WGET

So i'm attempting to create an alias/script to download all specific extensions from a website/directory using wget but i feel like there must be an easier way than what i've come up with.

Right now the code i've come up with from searching Google and the man pages is:

wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/

So in the example above i'm trying to download all the .tar.gz files from the OpenVZ precreated templates directory.

The above code works correctly but I have to manually specify --cut-dirs=2 which would cut out the /template/precreated/ directory structure that would normally be created and it also downloads the robots.txt file.

Now this isn't necessarily a problem and it's easy to just remove the robots.txt file but i was hoping i just missed something in the man pages that would allow me to do this same things without specifying the directory structure to cut out...

Thanks for any help ahead of time, it's greatly appreciated!

like image 869
sMyles Avatar asked Jan 23 '13 21:01

sMyles


2 Answers

Use the -R option

-R robots.txt,unwanted-file.txt

as a reject list of files you don't want (comma-separated).

As for scripting this:

URL=http://download.openvz.org/template/precreated/
CUTS=`echo ${URL#http://} | awk -F '/' '{print NF -2}'`
wget -r -l1 -nH --cut-dirs=${CUTS} --no-parent -A.tar.gz --no-directories -R robots.txt ${URL}

That should work based on the subdirectories in your URL.

like image 108
Anew Avatar answered Sep 23 '22 21:09

Anew


I would suggest, if this is really annoying and you're having to do it a lot, to just write a really short two-line script to delete it for you:

wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/
rm robots.txt
like image 21
Roguebantha Avatar answered Sep 23 '22 21:09

Roguebantha