Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"wget --domains" not helping.. what am I doing wrong? [closed]

Tags:

wget

I'm attempting to use wget to recursively grab only the .jpg files from a particular website, with a view to creating an amusing screensaver for myself. Not such a lofty goal really.

The problem is that the pictures are hosted elsewhere (mfrost.typepad.com), not on the main domain of the website (www.cuteoverload.com).

I have tried using "-D" to specified the allowed domains, but sadly no cute jpgs have been forthcoming. How could I alter the line below to make this work?

wget -r -l2 -np -w1 -D www.cuteoverload.com,mfrost.typepad.com -A.jpg -R.html.php.gif www.cuteoverload.com/

Thanks.

like image 416
nakedfanatic Avatar asked Dec 10 '08 08:12

nakedfanatic


People also ask

What is Wget and how to use it?

It supports various protocols such as HTTP, HTTPS, and FTP protocols and retrieval through HTTP proxies. Wget is non-interactive, meaning that it can work in the background while the user is not logged on to the system. A perfect tool for your shell scripts to grab files from HTTPS enabled website too.

How do I disable WGET from using proxies?

You can pass the --no-proxy option to the wget command. This option tells wget not to use proxies, even if the appropriate `*_proxy’ environment variable is defined: This option tells wget not to use proxies, even if the appropriate `*_proxy’ environment variable is defined:

How to disable WGET from checking the validity of a certificate?

If you don’t want about checking the validity of the certificate just pass the option --no-check-certificate to the wget command-line: You can pass the --no-proxy option to the wget command. This option tells wget not to use proxies, even if the appropriate `*_proxy’ environment variable is defined:


1 Answers

An examination of wget's man page[1] says this about -D:

Set domains to be followed. domain-list is a comma-separated list of domains. Note that it does not turn on -H.

This advisory about -H looks interesting:

Enable spanning across hosts when doing recursive retrieving.

So you need merely to add the -H flag to your invocation.

(Having done this, looks like all the images are restricted to mfrost.typepad.com/cute_overload/images/2008/12/07 and mfrost.typepad.com/cute_overload/images/2008/12/08).

-- [1] Although wget's primary reference manual is in info format.

like image 166
Nietzche-jou Avatar answered Nov 12 '22 03:11

Nietzche-jou