Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BASH script: Downloading consecutive numbered files with wget

I have a web server that saves the logs files of a web application numbered. A file name example for this would be:

dbsclog01s001.log dbsclog01s002.log dbsclog01s003.log 

The last 3 digits are the counter and they can get sometime up to 100.

I usually open a web browser, browse to the file like:

http://someaddress.com/logs/dbsclog01s001.log 

and save the files. This of course gets a bit annoying when you get 50 logs. I tried to come up with a BASH script for using wget and passing

http://someaddress.com/logs/dbsclog01s*.log 

but I am having problems with my the script. Anyway, anyone has a sample on how to do this?

thanks!

like image 679
wonderer Avatar asked Sep 15 '09 11:09

wonderer


People also ask

How do I download multiple files using wget?

If you want to download multiple files at once, use the -i option followed by the path to a local or external file containing a list of the URLs to be downloaded. Each URL needs to be on a separate line. If you specify - as a filename, URLs will be read from the standard input.

How do I download all files in a folder using wget?

When you use -r or –recursive option with wget, it will download all files & folders and recursively, without any filters. If you don't want to download specific files or folders, you exclude them using -R or –reject option, followed by the file or folder name to be excluded.

Can wget download files?

Wget is a networking command-line tool that lets you download files and interact with REST APIs. It supports the HTTP , HTTPS , FTP , and FTPS internet protocols. Wget can deal with unstable and slow network connections. In the event of a download failure, Wget keeps trying until the entire file has been retrieved.


2 Answers

#!/bin/sh  if [ $# -lt 3 ]; then         echo "Usage: $0 url_format seq_start seq_end [wget_args]"         exit fi  url_format=$1 seq_start=$2 seq_end=$3 shift 3  printf "$url_format\\n" `seq $seq_start $seq_end` | wget -i- "$@" 

Save the above as seq_wget, give it execution permission (chmod +x seq_wget), and then run, for example:

 $ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50 

Or, if you have Bash 4.0, you could just type

 $ wget http://someaddress.com/logs/dbsclog01s{001..050}.log 

Or, if you have curl instead of wget, you could follow Dennis Williamson's answer.

like image 72
ephemient Avatar answered Sep 19 '22 23:09

ephemient


curl seems to support ranges. From the man page:

URL          The URL syntax is protocol dependent. You’ll find a  detailed  descrip‐        tion in RFC 3986.         You  can  specify  multiple  URLs or parts of URLs by writing part sets        within braces as in:          http://site.{one,two,three}.com         or you can get sequences of alphanumeric series by using [] as in:          ftp://ftp.numericals.com/file[1-100].txt         ftp://ftp.numericals.com/file[001-100].txt    (with leading zeros)         ftp://ftp.letters.com/file[a-z].txt         No nesting of the sequences is supported at the moment, but you can use        several ones next to each other:          http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html         You  can  specify  any amount of URLs on the command line. They will be        fetched in a sequential manner in the specified order.         Since curl 7.15.1 you can also specify step counter for the ranges,  so        that you can get every Nth number or letter:          http://www.numericals.com/file[1-100:10].txt         http://www.letters.com/file[a-z:2].txt 

You may have noticed that it says "with leading zeros"!

like image 24
Dennis Williamson Avatar answered Sep 22 '22 23:09

Dennis Williamson