Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a curl/wget option that prevents saving files in case of http errors?

Tags:

I want to download a lot of urls in a script but I do not want to save the ones that lead to HTTP errors.

As far as I can tell from the man pages, neither curl or wget provide such functionality. Does anyone know about another downloader who does?

like image 814
akiva Avatar asked Sep 18 '08 04:09

akiva


People also ask

Can I use wget instead of curl?

Unlike curl , the wget command is solely for the retrieval of information from a remote server. By default, the information received is saved with the same name as in the provided URL. You can specify one or more specific DNS servers to use when utilizing wget to access a remote server.

What is the difference between wget and curl?

Differences Between wget and cURLWget is a simple transfer utility, while curl offers so much more. Curl provides the libcurl library, which can be expanded into GUI applications. Wget, on the other hand, is a simple command-line utility. Wget supports fewer protocols compared to cURL.


2 Answers

I think the -f option to curl does what you want:

-f, --fail

(HTTP) Fail silently (no output at all) on server errors. This is mostly done to better enable scripts etc to better deal with failed attempts. In normal cases when an HTTP server fails to deliver a document, it returns an HTML document stating so (which often also describes why and more). This flag will prevent curl from outputting that and return error 22. [...]

However, if the response was actually a 301 or 302 redirect, that still gets saved, even if its destination would result in an error:

$ curl -fO http://google.com/aoeu $ cat aoeu <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8"> <TITLE>301 Moved</TITLE></HEAD><BODY> <H1>301 Moved</H1> The document has moved <A HREF="http://www.google.com/aoeu">here</A>. </BODY></HTML> 

To follow the redirect to its dead end, also give the -L option:

-L, --location

(HTTP/HTTPS) If the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code), this option will make curl redo the request on the new place. [...]

like image 164
Thomas Avatar answered Nov 16 '22 12:11

Thomas


One liner I just setup for this very purpose:

(works only with a single file, might be useful for others)

A=$$; ( wget -q "http://foo.com/pipo.txt" -O $A.d && mv $A.d pipo.txt ) || (rm $A.d; echo "Removing temp file") 

This will attempt to download the file from the remote Host. If there is an Error, the file is not kept. In all other cases, it's kept and renamed.

like image 20
Oct Avatar answered Nov 16 '22 11:11

Oct