Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

freeing up memory in R

In R, I am trying to combine and convert several sets of timeseries data as an xts from http://www.truefx.com/?page=downloads however, the files are large and there many files so this is causing me issues on my laptop. They are stored as a csv file which have been compressed as a zip file.

Downloading them and unzipping them is easy enough (although takes up a lot of space on a hard drive).

Loading the 350MB+ files for one month's worth of data into the R is reasonably straight forward with the new fread() function in the data.table package.

Some datatable transformations are done (inside a function) so that the timestamps can be read easily and a mid column is produced. Then the datatable is saved as an RData file on the hard drive, and all references are to the datatable object are removed from the workspace, and a gc() is run after removal...however when looking at the R session in my Activity Monitor (run from a Mac)...it still looks like it is taking up almost 1GB of RAM...and things seem a bit laggy...I was intending to load several years worth of the csv files at the same time, convert them to useable datatables, combine them and then create a single xts object, which seems infeasible if just one month uses 1GB of RAM.

I know I can sequentially download each file, convert it, save it shut down R and repeat until i have a bunch of RData files that i can just load and bind, but was hopeing there might be a more efficient manner to do this so that after removing all references to a datatable you get back not "normal" or at startup levels of RAM usage. Are there better ways of clearing memory than gc()? Any suggestions would be greatly appreciated.

like image 280
h.l.m Avatar asked Jan 22 '13 01:01

h.l.m


People also ask

How do I free up memory in R?

You can force R to perform this check, and free the memory right away, by running the gc() command in R or going to Tools -> Memory -> Free Unused R Memory. Read more about Garbage Collection in R.

Why is R taking up so much memory?

R uses more memory probably because of some copying of objects. Although these temporary copies get deleted, R still occupies the space. To give this memory back to the OS you can call the gc function. However, when the memory is needed, gc is called automatically.

Does gc () Clear memory?

GC automatically releases memory when an object is no longer used. It does this by tracking how many names point to each object, and when there are no names pointing to an object, it deletes that object. Despite what you might have read elsewhere, there's never any need to call gc() yourself.

Does R have memory management?

So the gist of the matter is that R has been improving performance and memory management for a very long time.


1 Answers

In my project I had to deal with many large files. I organized the routine on the following principles:

  1. Isolate memory-hungry operations in separate R scripts.
  2. Run each script in new process which is destroyed after execution. Thus system gives used memory back.
  3. Pass parameters to the scripts via text file.

Consider the toy example below.

Data generation:

setwd("/path/to")
write.table(matrix(1:5e7, ncol=10), "temp.csv") # 465.2 Mb file

slave.R - memory consuming part

setwd("/path/to")
library(data.table)

# simple processing
f <- function(dt){
  dt <- dt[1:nrow(dt),]
  dt[,new.row:=1]
  return (dt)
}

# reads parameters from file
csv <- read.table("io.csv")
infile  <- as.character(csv[1,1])
outfile <- as.character(csv[2,1])

# memory-hungry operations
dt <- as.data.table(read.csv(infile))
dt <- f(dt)
write.table(dt, outfile)

master.R - executes slaves in separate processes

setwd("/path/to")

# 3 files processing
for(i in 1:3){
  # sets iteration-specific parameters
  csv <- c("temp.csv", paste("temp", i, ".csv", sep=""))
  write.table(csv, "io.csv")

  # executes slave process
  system("R -f slave.R")
}
like image 88
redmode Avatar answered Sep 22 '22 08:09

redmode