All,
I would like to get a list of files off of a server with the full url in tact. For example, I would like to get all the TIFFs from here.
http://hyperquad.telascience.org/naipsource/Texas/20100801/*
I can download all the .tif files with wget but I am looking for is just the full url to each file like this.
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_3_20100424.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_1_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif
Any thoughts on how to get all these files in to a list using something like curl or wget?
Adam
You'd need the server to be willing to give you a page with a listing on it. This would normally be an index.html or just ask for the directory.
http://hyperquad.telascience.org/naipsource/Texas/20100801/
It looks like you're in luck in this case so, at risk of upsetting the web master, the solution would be to use wget's recursive option. Specify a maximum recursion of 1 to keep it constrained to that single directory.
I would use lynx
shell web browser to get the list of links + grep
and awk
shell tools to filter the results, like this:
lynx -dump -listonly <URL> | grep http | grep <regexp> | awk '{print $2}'
..where:
http://hyperquad.telascience.org/naipsource/Texas/20100801/
\.tif$
Complete example commandline to get links to TIF files on this SO page:
lynx -dump -listonly http://stackoverflow.com/questions/6989681/getting-a-list-of-files-on-a-web-server | grep http | grep \.tif$ | awk '{print $2}'
..now returns:
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With