I want to download a lot of urls in a script but I do not want to save the ones that lead to HTTP errors.
As far as I can tell from the man pages, neither curl
or wget
provide such functionality. Does anyone know about another downloader who does?
Unlike curl , the wget command is solely for the retrieval of information from a remote server. By default, the information received is saved with the same name as in the provided URL. You can specify one or more specific DNS servers to use when utilizing wget to access a remote server.
Differences Between wget and cURLWget is a simple transfer utility, while curl offers so much more. Curl provides the libcurl library, which can be expanded into GUI applications. Wget, on the other hand, is a simple command-line utility. Wget supports fewer protocols compared to cURL.
I think the -f
option to curl
does what you want:
-f
,--fail
(HTTP) Fail silently (no output at all) on server errors. This is mostly done to better enable scripts etc to better deal with failed attempts. In normal cases when an HTTP server fails to deliver a document, it returns an HTML document stating so (which often also describes why and more). This flag will prevent curl from outputting that and return error 22. [...]
However, if the response was actually a 301 or 302 redirect, that still gets saved, even if its destination would result in an error:
$ curl -fO http://google.com/aoeu $ cat aoeu <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>301 Moved</TITLE></HEAD><BODY> <H1>301 Moved</H1> The document has moved <A HREF="http://www.google.com/aoeu">here</A>. </BODY></HTML>
To follow the redirect to its dead end, also give the -L
option:
-L
,--location
(HTTP/HTTPS) If the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code), this option will make curl redo the request on the new place. [...]
One liner I just setup for this very purpose:
(works only with a single file, might be useful for others)
A=$$; ( wget -q "http://foo.com/pipo.txt" -O $A.d && mv $A.d pipo.txt ) || (rm $A.d; echo "Removing temp file")
This will attempt to download the file from the remote Host. If there is an Error, the file is not kept. In all other cases, it's kept and renamed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With