Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use Wget to download all images into a single folder, from a URL?

Tags:

wget

Try this:

wget -nd -r -P /save/location -A jpeg,jpg,bmp,gif,png http://www.somedomain.com

Here is some more information:

-nd prevents the creation of a directory hierarchy (i.e. no directories).

-r enables recursive retrieval. See Recursive Download for more information.

-P sets the directory prefix where all files and directories are saved to.

-A sets a whitelist for retrieving only certain file types. Strings and patterns are accepted, and both can be used in a comma separated list (as seen above). See Types of Files for more information.


wget -nd -r -l 2 -A jpg,jpeg,png,gif http://t.co
  • -nd: no directories (save all files to the current directory; -P directory changes the target directory)
  • -r -l 2: recursive level 2
  • -A: accepted extensions
wget -nd -H -p -A jpg,jpeg,png,gif -e robots=off example.tumblr.com/page/{1..2}
  • -H: span hosts (wget doesn't download files from different domains or subdomains by default)
  • -p: page requisites (includes resources like images on each page)
  • -e robots=off: execute command robotos=off as if it was part of .wgetrc file. This turns off the robot exclusion which means you ignore robots.txt and the robot meta tags (you should know the implications this comes with, take care).

Example: Get all .jpg files from an exemplary directory listing:

$ wget -nd -r -l 1 -A jpg http://example.com/listing/

I wrote a shellscript that solves this problem for multiple websites: https://github.com/eduardschaeli/wget-image-scraper

(Scrapes images from a list of urls with wget)


Try this one:

wget -nd -r -P /save/location/ -A jpeg,jpg,bmp,gif,png http://www.domain.com

and wait until it deletes all extra information


According to the man page the -P flag is:

-P prefix --directory-prefix=prefix Set directory prefix to prefix. The directory prefix is the directory where all other files and subdirectories will be saved to, i.e. the top of the retrieval tree. The default is . (the current directory).

This mean that it only specifies the destination but where to save the directory tree. It does not flatten the tree into just one directory. As mentioned before the -nd flag actually does that.

@Jon in the future it would be beneficial to describe what the flag does so we understand how something works.