Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading csv files over ssl with R

Tags:

r

ssl

rcurl

Now that the whole world is clambering to use SSL all the time (a decision that makes a lot of sense) some of us who have used github and related services to store csv files have a little bit of a challenge. The read.csv() function does not support SSL when reading from a URL. To get around this I'm doing a little dance I like to call the SSL kabuki dance. I grab the text file with RCurl, write it to a temp file, then read it with read.csv(). Is there a smoother way of doing this? Better work-arounds?

Here's a simple example of the SSL kabuki:

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
temporaryFile <- tempfile()
con <- file(temporaryFile, open = "w")
cat(myCsv, file = con) 
close(con)

read.csv(temporaryFile)
like image 504
JD Long Avatar asked Nov 08 '10 16:11

JD Long


4 Answers

No need to write it to a file - just use textConnection()

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
WhatJDwants <- read.csv(textConnection(myCsv))
like image 135
Sean Avatar answered Oct 12 '22 03:10

Sean


Using Dirk's advice to explore method="" resulted in this slightly more concise approach which does not depend on the external RCurl package.

temporaryFile <- tempfile()
download.file("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv",destfile=temporaryFile, method="curl")
read.csv(temporaryFile)

But it appears that I can't just set options("download.file.method"="curl")

like image 20
JD Long Avatar answered Oct 12 '22 02:10

JD Long


Yes -- see help(download.file) which is pointed to by read.csv() and all its cousins. The method= argument there has:

method Method to be used for downloading files. Currently download methods "internal", "wget", "curl" and "lynx" are available, and there is a value "auto": see ‘Details’. The method can also be set through the option "download.file.method": see options().

and you then use this option to options():

download.file.method: Method to be used for download.file. Currently download methods "internal", "wget" and "lynx" are available. There is no default for this option, when method = "auto" is chosen: see download.file.

to turn to the external program curl, rather than the RCurl package.

Edit: Looks like I was half-right and half-wrong. read.csv() et al do not use the selected method, one needs to manually employ download.file() (which then uses curl or other selected methods). Other functions that do use download.file() (such as package installation or updates) will profit from setting the option, but for JD's initial query concerning csv files over https, an explicit download.file() is needed before read.csv() of the downloaded file.

like image 31
Dirk Eddelbuettel Avatar answered Oct 12 '22 02:10

Dirk Eddelbuettel


R core should open up the R connections as a C API. I've proposed this in the past:

https://stat.ethz.ch/pipermail/r-devel/2006-October/043056.html

with no response.

like image 37
Jeff Avatar answered Oct 12 '22 03:10

Jeff