According to the man page of wget, --acccept-regex is the argument to use when I need to selectively transfer files whose names matching a certain regular expression. However, I am not sure how to use --accept-regex.
Assuming I want to obtain files diffs-000107.tar.gz, diffs-000114.tar.gz, diffs-000121.tar.gz, diffs-000128.tar.gz in IMDB data directory ftp://ftp.fu-berlin.de/pub/misc/movies/database/diffs/. "diffs\-0001[0-9]{2}\.tar\.gz" seems to be an ok regex to describe the file names.
However, when executing the following wget command
wget -r --accept-regex='diffs\-0001[0-9]{2}\.tar\.gz' ftp://ftp.fu-berlin.de/pub/misc/movies/database/diffs/
wget indiscriminately acquires all files in the ftp://ftp.fu-berlin.de/pub/misc/movies/database/diffs/ directory.
I wonder if anyone could tell what I have possibly done wrong?
You can not specify a regular expression in the wget -R key, but you can specify a template (like file template in a shell). $ wget -R 'newsbrief-*' ...
A regular expression to match valid filenames. It can be used to validate filenames entered by a user of an application, or the filename of files uploaded from a scanner. The expression ensures that your filename conforms to specific rules, including no leading or trailing spaces and no use of any characters besides the letters A-Z and numbers 0-9.
Beware that it seems you can use --reject-regex only once per wget call. That is, you have to use | in a single regex if you want to select on several regex :
That is, you have to use | in a single regex if you want to select on several regex : Thanks for the example with several regex. Does reject-regex work with things like . or *, what kind of regex is it, extended regex or PCRE regex?
Be careful --accept-regex
is for the complete URL. But our target is some specific files. So we will use -A
.
For example,
wget -r -np -nH -A "IMG[012][0-9].jpg" http://x.com/y/z/
will download all the files from IMG00.jpg to IMG29.jpg from the URL.
Note that a matching pattern contains shell-like wildcards, e.g. ‘books’ or ‘zelazny196[0-9]*’.
reference: wget manual: https://www.gnu.org/software/wget/manual/wget.html regex: https://regexone.com/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With