I'd like to know if it's possible to do an ls
of a URL, so I can see what *.js files are available in a website, for example. Something like:
wget --list-files -A.js stackoverflow.com
and get
ajax/libs/jquery/1.7.1/jquery.min.js
js/full.js
js/stub.js
...
In order to download a file using Wget, type wget followed by the URL of the file that you wish to download. Wget will download the file in the given URL and save it in the current directory.
Grab file with curl run: $ curl https://your-domain/file.pdf. Get file using ftp or sftp protocol: $ curl ftp://ftp-your-domain-name/file.tar.gz. You can set the output file name while downloading file with the curl, execute: $ curl -o file. pdf https://your-domain-name/long-file-name.pdf.
cURL is more powerful and can perform more tasks than Wget, which is just a simple tool. cURL works on many more protocols such as IMAP, POP3, RTMP, RTSP, which Wget doesn't. With Wget, you can download content recursively.
Differences Between wget and cURLWget is a simple transfer utility, while curl offers so much more. Curl provides the libcurl library, which can be expanded into GUI applications. Wget, on the other hand, is a simple command-line utility. Wget supports fewer protocols compared to cURL.
You can't do the equivalent of an ls
unless the server provides such listings itself. You could however retrieve index.html
and then check for includes, e.g. something like
wget -O - http://www.example.com | grep "type=.\?text/javascript.\?"
Note that this relies on the HTML being formatted in a certain way -- in this case with the includes on individual lines for example. If you want to do this properly, I'd recommend parsing the HTML and extracting the javascript includes that way.
Let's consider this open directory (http://tug.ctan.org/macros/latex2e/required/amscls/) as the object of our experimentation. This directory belongs to the Comprehensive TeX Archive Network, so don't be too worried about downloading malicious files.
Now, let's suppose that we want to list all files whose extension is pdf
. We can do so by executing the following command.
The command shown below will save the output of wget
in the file main.log
. Because wget
send a request for each file and it prints some information about the request, we can then grep
the output to get a list of files which belong to the specified directory.
wget \
--accept '*.pdf' \
--reject-regex '/\?C=[A-Z];O=[A-Z]$' \
--execute robots=off \
--recursive \
--level=0 \
--no-parent \
--spider \
'http://tug.ctan.org/macros/latex2e/required/amscls/doc/' 2>&1 | tee main.log
Now, we can list the files whose extension is pdf
by using grep
.
grep '^--' main.log
--2020-11-23 10:39:46-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/
--2020-11-23 10:39:47-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/
--2020-11-23 10:39:47-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsbooka.pdf
--2020-11-23 10:39:47-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsclass.pdf
--2020-11-23 10:39:47-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsdtx.pdf
--2020-11-23 10:39:47-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsmidx.pdf
--2020-11-23 10:39:48-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsthdoc.pdf
--2020-11-23 10:39:48-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/thmtest.pdf
--2020-11-23 10:39:48-- http://tug.ctan.org/macros/latex2e/required/amscls/doc/upref.pdf
Note that we could also get the list of all files in the directory and then execute grep
on the output of the command. However, doing this would have taken more time since apparently a request is sent for each file. By using the --accept
, we can make wget
send a request for only those files in which we are interested in.
Last but not least, the sizes of the files are saved in the file main.log
, so you can check that information in that file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With