I'd like to know if it's possible to do an <code>ls</code> of a URL, so I can see what *.js files are available in a website, for example. Something like: <pre class="prettyprint"><code>wget --list-files -A.js stackoverflow.com </code></pre> and get <pre class="prettyprint"><code>ajax/libs/jquery/1.7.1/jquery.min.js js/full.js js/stub.js ... </code></pre>

You can't do the equivalent of an <code>ls</code> unless the server provides such listings itself. You could however retrieve <code>index.html</code> and then check for includes, e.g. something like <pre class="prettyprint"><code>wget -O - http://www.example.com | grep "type=.\?text/javascript.\?" </code></pre> Note that this relies on the HTML being formatted in a certain way -- in this case with the includes on individual lines for example. If you want to do this properly, I'd recommend parsing the HTML and extracting the javascript includes that way.

How to get a list of available files using wget or curl?

Tags:

bash

terminal

curl

wget

I'd like to know if it's possible to do an ls of a URL, so I can see what *.js files are available in a website, for example. Something like:

wget --list-files -A.js stackoverflow.com

and get

ajax/libs/jquery/1.7.1/jquery.min.js
js/full.js
js/stub.js
...

862

asked May 13 '12 11:05

nachocab

2 Answers

You can't do the equivalent of an ls unless the server provides such listings itself. You could however retrieve index.html and then check for includes, e.g. something like

wget -O - http://www.example.com | grep "type=.\?text/javascript.\?"

Note that this relies on the HTML being formatted in a certain way -- in this case with the includes on individual lines for example. If you want to do this properly, I'd recommend parsing the HTML and extracting the javascript includes that way.

answered Oct 11 '22 08:10

Lars Kotthoff

Let's consider this open directory (http://tug.ctan.org/macros/latex2e/required/amscls/) as the object of our experimentation. This directory belongs to the Comprehensive TeX Archive Network, so don't be too worried about downloading malicious files.

Now, let's suppose that we want to list all files whose extension is pdf. We can do so by executing the following command.

The command shown below will save the output of wget in the file main.log. Because wget send a request for each file and it prints some information about the request, we can then grep the output to get a list of files which belong to the specified directory.

wget \
  --accept '*.pdf' \
  --reject-regex '/\?C=[A-Z];O=[A-Z]$' \
  --execute robots=off \
  --recursive \
  --level=0 \
  --no-parent \
  --spider \
  'http://tug.ctan.org/macros/latex2e/required/amscls/doc/' 2>&1 | tee main.log

Now, we can list the files whose extension is pdf by using grep.

grep '^--' main.log

--2020-11-23 10:39:46--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/
--2020-11-23 10:39:47--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/
--2020-11-23 10:39:47--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsbooka.pdf
--2020-11-23 10:39:47--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsclass.pdf
--2020-11-23 10:39:47--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsdtx.pdf
--2020-11-23 10:39:47--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsmidx.pdf
--2020-11-23 10:39:48--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/amsthdoc.pdf
--2020-11-23 10:39:48--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/thmtest.pdf
--2020-11-23 10:39:48--  http://tug.ctan.org/macros/latex2e/required/amscls/doc/upref.pdf

Note that we could also get the list of all files in the directory and then execute grep on the output of the command. However, doing this would have taken more time since apparently a request is sent for each file. By using the --accept, we can make wget send a request for only those files in which we are interested in.

Last but not least, the sizes of the files are saved in the file main.log, so you can check that information in that file.

answered Oct 11 '22 07:10

doltes

Related questions
                            
                                Why won't my build phase scripts be executed when creating an IPA from command line?
                            
                                Refresh eclipse from the command line
                            
                                In C++, calling fork when cin is a bash heredoc causes repeated input fragments
                            
                                Can the at command in rest be simplified?
                            
                                Stop git from writing non-errors to stderr
                            
                                Errors in dumping a remote Git based SVN repository over HTTP
                            
                                how to pass numeric array from bash to csh
                            
                                Finding a string in a varaible with if statement [duplicate]
                            
                                Wget file and send it to Bash
                            
                                Docker container can't see a serial port device
                            
                                Pointing bash to a python installed on windows
                            
                                Logging requests being served by tensorflow serving model
                            
                                Date calculation using GNU date
                            
                                Why are all NUL removed from my script?
                            
                                Variable scope for bash shell scripts and functions in the script
                            
                                Portable (cross platform) scripting with unicode filenames
                            
                                Bash prompt line wrapping issue
                            
                                What is the equivalent to xargs -r under OsX
                            
                                awk's $1 conflicts with $1 in shell script
                            
                                How to use Google application-specific password in script?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With