Now that the whole world is clambering to use SSL all the time (a decision that makes a lot of sense) some of us who have used github and related services to store csv files have a little bit of a challenge. The read.csv() function does not support SSL when reading from a URL. To get around this I'm doing a little dance I like to call the SSL kabuki dance. I grab the text file with RCurl, write it to a temp file, then read it with read.csv(). Is there a smoother way of doing this? Better work-arounds?
Here's a simple example of the SSL kabuki:
require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
temporaryFile <- tempfile()
con <- file(temporaryFile, open = "w")
cat(myCsv, file = con)
close(con)
read.csv(temporaryFile)
No need to write it to a file - just use textConnection()
require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
WhatJDwants <- read.csv(textConnection(myCsv))
Using Dirk's advice to explore method=""
resulted in this slightly more concise approach which does not depend on the external RCurl package.
temporaryFile <- tempfile()
download.file("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv",destfile=temporaryFile, method="curl")
read.csv(temporaryFile)
But it appears that I can't just set options("download.file.method"="curl")
Yes -- see help(download.file)
which is pointed to by read.csv()
and all its cousins. The method=
argument there has:
method
Method to be used for downloading files. Currently download methods "internal", "wget", "curl" and "lynx" are available, and there is a value "auto": see ‘Details’. The method can also be set through the option "download.file.method": see options().
and you then use this option to options()
:
download.file.method:
Method to be used for download.file. Currently download methods "internal", "wget" and "lynx" are available. There is no default for this option, when method = "auto" is chosen: see download.file.
to turn to the external program curl
, rather than the RCurl package.
Edit: Looks like I was half-right and half-wrong. read.csv()
et al do not use the selected method, one needs to manually employ download.file()
(which then uses curl
or other selected methods). Other functions that do use download.file()
(such as package installation or updates) will profit from setting the option, but for JD's initial query concerning csv files over https, an explicit download.file()
is needed before read.csv()
of the downloaded file.
R core should open up the R connections as a C API. I've proposed this in the past:
https://stat.ethz.ch/pipermail/r-devel/2006-October/043056.html
with no response.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With