Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: download data securely using TLS/SSL

Tags:

Official Statements

In the past the base R download.file() was unable to work with HTTPS protocols and it was necessary to use RCurl. Since R 3.3.0:

All builds have support for https: URLs in the default methods for download.file(), url() and code making use of them. Unfortunately that cannot guarantee that any particular https: URL can be accessed. ... Different access methods may allow different protocols or use private certificate bundles ...

The download.file() help still says:

Contributed package 'RCurl' provides more comprehensive facilities to download from URLs.

which (by the way includes cookies and headers management).

Based on RCurl FAQ (look for "When I try to interact with a URL via https, I get an error"), HTTPS URLs can be managed with:

getURL(url, cainfo="CA bundle") 

where CA bundle is the path to a certificate authority bundle file. One such a bundle is available from the curl site itself:
https://curl.haxx.se/ca/cacert.pem

Current status

Tests are based on Windows platforms

For many HTTPS websites download.file() works as stated:

download.file(url="https://www.google.com", destfile="google.html") download.file(url="https://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem") 

As regards RCurl, using the cacert.pem bundle, downloaded above, one might get an error:

library(RCurl) getURL("https://www.google.com", cainfo = "cacert.pem")     # Error in function (type, msg, asError = TRUE)  :  #   SSL certificate problem: unable to get local issuer certificate 

In this instance, simply removing the reference to the certificate bundle solves the problem:

getURL("https://www.google.com")                      # works getURL("https://www.google.com", ssl.verifypeer=TRUE) # works 

ssl.verifypeer = TRUE is used to be sure that success is not due to getURL() suppressing security. The argument is documented in RCurl FAQ.

However, in other instances, the connection fails:

getURL("https://curl.haxx.se/ca/cacert.pem") # Error in function (type, msg, asError = TRUE)  :  #  error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version 

And similarly, using the previously downloaded bundle:

getURL("https://curl.haxx.se/ca/cacert.pem", cainfo = "cacert.pem") # Error in function (type, msg, asError = TRUE)  :  #   error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version 

The same error happens even when suppressing the security:

getURL("https://curl.haxx.se/ca/cacert.pem", ssl.verifypeer=FALSE) # same error as above 

Questions

  1. How to use HTTPS properly in RCurl?
  2. As regards mere file downloads (no headers, cookies, etc.), is there any benefit in using RCurl instead of download.file()?
  3. Is RCurl become obsolete and should we opt for curl?

Update

The issue persists as of R version 3.4.1 (2017-06-30) under Windows 10.

like image 301
antonio Avatar asked Apr 21 '17 17:04

antonio


People also ask

What is difference between SSL and TLS?

Transport Layer Security (TLS) is the successor protocol to SSL. TLS is an improved version of SSL. It works in much the same way as the SSL, using encryption to protect the transfer of data and information. The two terms are often used interchangeably in the industry although SSL is still widely used.

How SSL TLS works in data transmission?

SSL/TLS uses both asymmetric and symmetric encryption to protect the confidentiality and integrity of data-in-transit. Asymmetric encryption is used to establish a secure session between a client and a server, and symmetric encryption is used to exchange data within the secured session.

How does TLS work with https?

The HTTPS Stack An SSL or TLS certificate works by storing your randomly generated keys (public and private) in your server. The public key is verified with the client and the private key used in the decryption process. HTTP is just a protocol, but when paired with TLS or transport layer security it becomes encrypted.

Does SSL encrypt data?

SSL, also known as TLS, uses encryption to keep user data secure, authenticate the identity of websites, and stop attackers from tampering with Internet communications.


1 Answers

openssl bundled with RCurl is a bit old currently, which does not support the TLS v1.2

Yes, curl package is OK

Or you can use httr package which is a wrapper for the curl package

> library("httr") > GET("https://curl.haxx.se/ca/cacert.pem",config(sslversion=6,ssl_verifypeer=1)) Response [https://curl.haxx.se/ca/cacert.pem]   Date: 2017-08-16 17:07   Status: 200   Content-Type: application/x-pem-file   Size: 256 kB <BINARY BODY> 
like image 140
Satie Avatar answered Sep 22 '22 08:09

Satie