Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can wget save only certains file types linked to from pages linked to by the target page?

Tags:

linux

wget

How can wget save only certain file types linked to from pages linked to by the target page, regardless of the domain in which the certain files are?

Trying to speed up a task I have to do often.

I've been rooting through the wget docs and googling, but nothing seems to work. I keep on either getting just the target page or the subpages without the files (even using -H), so I'm obviously doing badly at this.

So, essentially, example.com/index1/ contains links to example.com/subpage1/ and example.com/subpage2/, while the subpages contain links to example2.com/file.ext and example2.com/file2.ext, etc. However, example.com/index1.html may link to example.com/index2/ which has links to more subpages I don't want.

Can wget even do this, and if not then what do you suggest I use? Thanks.

like image 657
Nomen Avatar asked Jul 10 '11 20:07

Nomen


People also ask

How do I use wget to save a file?

In order to download a file using Wget, type wget followed by the URL of the file that you wish to download. Wget will download the file in the given URL and save it in the current directory.

How do you find verbose output in wget?

-v --verbose Turn on verbose output, with all the available data. The default output is verbose. -nv --no-verbose Turn off verbose without being completely quiet (use -q for that), which means that error messages and basic information still get printed. --report-speed=type Output bandwidth as type.


1 Answers

Following command worked for me.

wget -r --accept "*.ext" --level 2 "example.com/index1/"

Need to do recursively so -r should be added.

like image 114
TheKojuEffect Avatar answered Sep 17 '22 13:09

TheKojuEffect