How do I capture the HTTP error code from a download.file request?

Tags:

This code attempts to download a page that does not exist:

url <- "https://en.wikipedia.org/asdfasdfasdf"
status_code <- download.file(url, destfile = "output.html", method = "libcurl")

This returns a 404 error:

trying URL 'https://en.wikipedia.org/asdfasdfasdf'
Error in download.file(url, destfile = "output.html", method = "libcurl") : 
  cannot open URL 'https://en.wikipedia.org/asdfasdfasdf'
In addition: Warning message:
In download.file(url, destfile = "output.html", method = "libcurl") :
  cannot open URL 'https://en.wikipedia.org/asdfasdfasdf': HTTP status was '404 Not Found'

but the code variable still contains a 0, even though the documentation for download.file states that the returned value is:

An (invisible) integer code, 0 for success and non-zero for failure. For the "wget" and "curl" methods this is the status code returned by the external program. The "internal" method can return 1, but will in most cases throw an error.

The results are the same if I use curl or wget as the download method. What am I missing here? Is the only option to call warnings() and parse the output?

I've seen other questions about using download.file, but none (that I can find) that actually retrieve the HTTP status code.

858

asked Dec 17 '18 21:12

Michael A

2 Answers

Probably the best option is to use cURL library directly rather than via the download.file wrapper which does not expose the full functionality of cURL. We can do this, for example, using the RCurl package (although other packages such as httr, or system calls can also achieve the same thing). Using cURL directly will allow you to access the cURL Info, including response code. For example:

library(RCurl)
curl = getCurlHandle()
x = getURL("https://en.wikipedia.org/asdfasdfasdf", curl = curl)
write(x, 'output.html')
getCurlInfo(curl)$response.code
# [1] 404

Although the first option above is much cleaner, if you really want to use download.file instead, one possible way would be to capture the warning using withCallingHandlers

try(withCallingHandlers( 
  download.file(url, destfile = "output.html", method = "libcurl"),
  warning = function(w) {
    my.warning <<- sub(".+HTTP status was ", "", w)
    }),
  silent = TRUE)

cat(my.warning)
'404 Not Found'

131

answered Sep 20 '22 05:09

dww

If you don't mind using a different method you can try GET from the httr package:

url_200 <- "https://en.wikipedia.org/wiki/R_(programming_language)"
url_404 <- "https://en.wikipedia.org/asdfasdfasdf"

# OK
raw_200 <- httr::GET(url_200)
raw_200$status_code
#> [1] 200

# Not found
raw_404 <- httr::GET(url_404)
raw_404$status_code
#> [1] 404

^{Created on 2019-01-02 by the reprex package (v0.2.1)}

answered Sep 19 '22 05:09

Birger

Related questions
                            
                                How to round all the values of a prop.table in R in one line?
                            
                                Excel graphics with ggplot2
                            
                                str_replace_all not working in pipeline
                            
                                convert a netcdf time variable to an R date object
                            
                                Inverse to starts_with() in dplyr
                            
                                r group lag sum
                            
                                Apply over nested list names: Sub out character in nested list names
                            
                                use csl-file for pdf-output in bookdown
                            
                                How to change TOC depth in R Bookdown (GitBook)?
                            
                                data.table do not compute NA groups in by
                            
                                Remove border from geom_rect using ggplot2
                            
                                Preserve environment variables when spawning shiny processes within a container
                            
                                Tiny plot output from sankeyNetwork (NetworkD3) in Firefox
                            
                                Custom Loss Function in R Keras
                            
                                Spacing between legend keys in ggplot
                            
                                R Markdown conditionals for knitting HTML vs PDF
                            
                                Encoding issue with write.xlsx (openxlsx)
                            
                                R - Finding least cost path through raster image (maze)?
                            
                                "Reversed" use of fct_infreq() in ggplot2
                            
                                How to debug (line-by-line) Rcpp generated code in Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I capture the HTTP error code from a download.file request?

Tags:

http

curl

r

wget

Michael A

People also ask

2 Answers

dww

Birger

Recent Activity

Donate For Us