Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to get CSV output into a data frame?

Tags:

r

I have a program that outputs lines of CSV data that I want to load into a data frame. I currently load the data like so:

tmpFilename <- "tmp_file"
system(paste(procName, ">", tmpFilename), wait=TRUE)
myData <- read.csv(tmpFilename) # (I also pass in colClasses and nrows for efficiency)

However, I thought redirecting the output to a file just to read from it was inefficient (the program spits out about 30MB, so I want to handle it with optimal performance). I thought textConnection would solve this, so I tried:

con <- textConnection(system(procName, intern=TRUE))
myData <- read.csv(con)

This runs a lot slower, though, and whereas the first solution degrades linearly with input size, the textConnection solution's performance degrades exponentially it seems. The slowest part is creating the textConnection. read.csv here actually completes quicker than in the first solution since it's reading from memory.

My question is then, is creating a file just to run read.csv on it my best option with respect to speed? Is there a way to speed up the creation of a textConnection? bonus: why is creating a textConnection so slow?

like image 702
Hudon Avatar asked May 17 '13 03:05

Hudon


1 Answers

The "fastest way" will probably involve using something other than read.csv. However, sticking with read.csv, using pipe may be the way to go:

myData <- read.csv(pipe(procName))

It avoids reading the full text output into an intermediate buffer (at least before read.csv gets ahold of it).

Some timing comparisons:

> write.csv(data.frame(x=rnorm(1e5)), row.names=FALSE, file="norm.csv")
> system.time(d <- read.csv("norm.csv"))
   user  system elapsed 
  0.398   0.004   0.402 
> system.time(d <- read.csv(textConnection(system("cat norm.csv", intern=TRUE))))
   user  system elapsed 
 56.159   0.106  56.095 
> system.time(d <- read.csv(pipe("cat norm.csv")))
   user  system elapsed 
  0.475   0.012   0.531 
like image 79
David F Avatar answered Sep 21 '22 22:09

David F