Let's say I've got:
http://svnserver/repoid/appname/language/module/
I'm trying to download everything below module node, unfortunately page contains links to both - items below and items above in hierarchy (much like bash ls -ltr
output). So when I use wget with recursive download option I end up with complete website (svn repository) downloaded and not only the module I need.
Is there any trick to prevent wget from following links that point to parent elements?
The wget tool is essentially a spider that scrapes / leeches web pages but some web hosts may block these spiders with the robots. txt files. Also, wget will not follow links on web pages that use the rel=nofollow attribute. You can however force wget to ignore the robots.
When running Wget with -r or -p, but without -N or -nc, re-downloading a file will result in the new copy simply overwriting the old. Adding -nc will prevent this behavior, instead causing the original version to be preserved and any newer copies on the server to be ignored.
Only the archive you are interested in will be downloaded. Essentially, ' --no-parent ' is similar to ' -I/~luzer/my-archive ', only it handles redirections in a more intelligent fashion. Note that, for HTTP (and HTTPS), the trailing slash is very important to ' --no-parent '.
If the download is successful without any errors wget will return 0 . Anything else indicates something went wrong. Take a look at the "Exit status" section of man wget .
Sounds like you are looking for the --no-parent
parameter.
From the output of wget --help
-np, --no-parent don't ascend to the parent directory.
From man wget
:
-np
--no-parent
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With