Tricks to avoid duplication of memory allocation when returning data.table or data.frame in R?

Question

I have created a function that is called to read in and then return a data.table:

read.in.data <- function(filename)
{
    library(data.table)
    data.holder<-read.table(filename, skip=1)
    return(data.table(data.holder))
}

I have noticed from observing my RAM as the function processes that R seems to process this in 2 steps (or at least this is my best guess for what's going on). For example, when I load a 1.5 GB file (15 columns with a total of 136 characters per row), R seems to 1) read in the data and use 1.5 GB of RAM, and then 2) use another 1.5 GB of RAM for the return.

Are there some tricks to creating a function to create a data.table (or data.frame for that matter) and return the data.table without requiring duplication in memory? Or must I do all processing for the data.table within the function where the table is created?

Observations: If I run this code twice in a row, the memory is not cleared; since I only have 8 GB of RAM, the function fails. If I skip the step of storing the "read.table" in a variable (as shown below), I don't get any benefit. I wouldn't want to do this any way, since I'd like to have the ability to clean up the data.table before returning it. A fix to my problem would also enable me to process larger files without running out of memory.

short.read.trk <- function(fntrk)
{
    library(data.table)
    return(data.table(read.table(fntrk, skip=1)))
}

Ken Williams · Accepted Answer

If memory savings is mostly what you're after, you could convert it one column at a time:

library(data.table)
read.in.data <- function(filename)
{
  data.holder <- read.table(filename, skip=1)
  dt <- data.table(data.holder[[1]])
  names(dt) <- names(data.holder)[1]
  data.holder[[1]] <- NULL

  for(n in names(data.holder)) {
    dt[, `:=`(n, data.holder[[n]]) ]
    data.holder[[n]] <- NULL
  }
  return(dt)
}

(untested)

It won't be any faster, in fact it's probably slower. But it should be less wasteful of memory.

Tricks to avoid duplication of memory allocation when returning data.table or data.frame in R?

Tags:

memory-management

return

r

data.table

Docuemada

1 Answers

Ken Williams

Recent Activity

Donate For Us

Tricks to avoid duplication of memory allocation when returning data.table or data.frame in R?

Tags:

memory-management

return

r

data.table

Docuemada

1 Answers

Ken Williams

Related questions

Recent Activity

Donate For Us