Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using wget to download select directories from ftp server

I'm trying to understand how to use wget to download specific directories from a bunch of different ftp sites with economic data from the US government.

As a simple example, I know that I can download an entire directory using a command like:

wget  --timestamping  --recursive --no-parent ftp://ftp.bls.gov/pub/special.requests/cew/2013/county/

But I envision running more complex downloads, where I might want to limit a download to a handful of directories. So I've been looking at the --include option. But I don't really understand how it works. Specifically, why doesn't this work:

wget --timestamping --recursive -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/

The following does work, in the sense that it downloads files, but it downloads way more than I need (everything in the 2013 directory, vs just the county subdirectory):

wget --timestamping --recursive -I /pub/special.requests/cew/2013/ ftp://ftp.bls.gov/pub/special.requests/cew/

I can't tell if i'm not understanding something about wget or if my issue is with something more fundamental to ftp server structures.

Thanks for the help!

like image 591
Al R. Avatar asked Dec 23 '13 21:12

Al R.


People also ask

Can wget download folders?

wget will download all files & subfolders under your folder URL. If you want to selectively download on the target folder and not its subfolders, use -l1 option. If you want to download the folder and level 1 subfolder (e.g. www.example.com/products/category) use -l2 option.

Does wget work with FTP?

Globbing makes Wget look for a directory listing, which is system-specific. This is why it currently works only with Unix FTP servers (and the ones emulating Unix ls output).


1 Answers

Based on this doc it seems that the filtering functions of wget are very limited.

When using the --recursive option, wget will download all linked documents after applying the various filters, such as --no-parent and -I, -X, -A, -R options.

In your example:

wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/

This won't download anything, because the -I option specifies to include only links matching /pub/special.requests/cew/2013/county/, but on the page /pub/special.requests/cew/ there are no such links, so the download stops there. This will work though:

wget -r -I /pub/special.requests/cew/2013/county/ ftp://ftp.bls.gov/pub/special.requests/cew/2013/

... because in this case the /pub/special.requests/cew/2013/ page does have a link to county/

Btw, you can find more details in this doc than on the man page:

http://www.gnu.org/software/wget/manual/html_node/

like image 66
janos Avatar answered Oct 02 '22 11:10

janos