Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to force wget to skip links leading to parent elements?

Tags:

bash

wget

Let's say I've got:

http://svnserver/repoid/appname/language/module/

I'm trying to download everything below module node, unfortunately page contains links to both - items below and items above in hierarchy (much like bash ls -ltr output). So when I use wget with recursive download option I end up with complete website (svn repository) downloaded and not only the module I need.

Is there any trick to prevent wget from following links that point to parent elements?

like image 908
Paweł Staniec Avatar asked Feb 13 '12 10:02

Paweł Staniec


People also ask

What is wget spider?

The wget tool is essentially a spider that scrapes / leeches web pages but some web hosts may block these spiders with the robots. txt files. Also, wget will not follow links on web pages that use the rel=nofollow attribute. You can however force wget to ignore the robots.

What is NC in wget?

When running Wget with -r or -p, but without -N or -nc, re-downloading a file will result in the new copy simply overwriting the old. Adding -nc will prevent this behavior, instead causing the original version to be preserved and any newer copies on the server to be ignored.

What is -- no parent in wget?

Only the archive you are interested in will be downloaded. Essentially, ' --no-parent ' is similar to ' -I/~luzer/my-archive ', only it handles redirections in a more intelligent fashion. Note that, for HTTP (and HTTPS), the trailing slash is very important to ' --no-parent '.

How do I know if my wget is successful?

If the download is successful without any errors wget will return 0 . Anything else indicates something went wrong. Take a look at the "Exit status" section of man wget .


2 Answers

Sounds like you are looking for the --no-parent parameter.

From the output of wget --help

 -np, --no-parent                 don't ascend to the parent directory.
like image 152
a_horse_with_no_name Avatar answered Sep 19 '22 07:09

a_horse_with_no_name


From man wget:

-np

--no-parent

Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.

like image 23
Oleg Mikheev Avatar answered Sep 20 '22 07:09

Oleg Mikheev