In R, I am trying to combine and convert several sets of timeseries data as an xts from http://www.truefx.com/?page=downloads however, the files are large and there many files so this is causing me issues on my laptop. They are stored as a csv file which have been compressed as a zip file. Downloading them and unzipping them is easy enough (although takes up a lot of space on a hard drive). Loading the 350MB+ files for one month's worth of data into the R is reasonably straight forward with the new <code>fread()</code> function in the <code>data.table</code> package. Some datatable transformations are done (inside a function) so that the timestamps can be read easily and a mid column is produced. Then the datatable is saved as an RData file on the hard drive, and all references are to the datatable object are removed from the workspace, and a <code>gc()</code> is run after removal...however when looking at the R session in my Activity Monitor (run from a Mac)...it still looks like it is taking up almost 1GB of RAM...and things seem a bit laggy...I was intending to load several years worth of the csv files at the same time, convert them to useable datatables, combine them and then create a single xts object, which seems infeasible if just one month uses 1GB of RAM. I know I can sequentially download each file, convert it, save it shut down R and repeat until i have a bunch of RData files that i can just load and bind, but was hopeing there might be a more efficient manner to do this so that after removing all references to a datatable you get back not "normal" or at startup levels of RAM usage. Are there better ways of clearing memory than <code>gc()</code>? Any suggestions would be greatly appreciated.

In my project I had to deal with many large files. I organized the routine on the following principles: <ol> <li>Isolate memory-hungry operations in separate <code>R</code> scripts.</li> <li>Run each script in new process which is destroyed after execution. Thus system gives used memory back.</li> <li>Pass parameters to the scripts via text file.</li> </ol> Consider the toy example below. Data generation: <pre class="prettyprint"><code>setwd("/path/to") write.table(matrix(1:5e7, ncol=10), "temp.csv") # 465.2 Mb file </code></pre> slave.R - memory consuming part <pre class="prettyprint"><code>setwd("/path/to") library(data.table) # simple processing f <- function(dt){ dt <- dt[1:nrow(dt),] dt[,new.row:=1] return (dt) } # reads parameters from file csv <- read.table("io.csv") infile <- as.character(csv[1,1]) outfile <- as.character(csv[2,1]) # memory-hungry operations dt <- as.data.table(read.csv(infile)) dt <- f(dt) write.table(dt, outfile) </code></pre> master.R - executes slaves in separate processes <pre class="prettyprint"><code>setwd("/path/to") # 3 files processing for(i in 1:3){ # sets iteration-specific parameters csv <- c("temp.csv", paste("temp", i, ".csv", sep="")) write.table(csv, "io.csv") # executes slave process system("R -f slave.R") } </code></pre>

freeing up memory in R

Tags:

r

garbage-collection

In R, I am trying to combine and convert several sets of timeseries data as an xts from http://www.truefx.com/?page=downloads however, the files are large and there many files so this is causing me issues on my laptop. They are stored as a csv file which have been compressed as a zip file.

Downloading them and unzipping them is easy enough (although takes up a lot of space on a hard drive).

Loading the 350MB+ files for one month's worth of data into the R is reasonably straight forward with the new fread() function in the data.table package.

Some datatable transformations are done (inside a function) so that the timestamps can be read easily and a mid column is produced. Then the datatable is saved as an RData file on the hard drive, and all references are to the datatable object are removed from the workspace, and a gc() is run after removal...however when looking at the R session in my Activity Monitor (run from a Mac)...it still looks like it is taking up almost 1GB of RAM...and things seem a bit laggy...I was intending to load several years worth of the csv files at the same time, convert them to useable datatables, combine them and then create a single xts object, which seems infeasible if just one month uses 1GB of RAM.

I know I can sequentially download each file, convert it, save it shut down R and repeat until i have a bunch of RData files that i can just load and bind, but was hopeing there might be a more efficient manner to do this so that after removing all references to a datatable you get back not "normal" or at startup levels of RAM usage. Are there better ways of clearing memory than gc()? Any suggestions would be greatly appreciated.

280

asked Jan 22 '13 01:01

h.l.m

1 Answers

In my project I had to deal with many large files. I organized the routine on the following principles:

Isolate memory-hungry operations in separate R scripts.
Run each script in new process which is destroyed after execution. Thus system gives used memory back.
Pass parameters to the scripts via text file.

Consider the toy example below.

Data generation:

setwd("/path/to")
write.table(matrix(1:5e7, ncol=10), "temp.csv") # 465.2 Mb file

slave.R - memory consuming part

setwd("/path/to")
library(data.table)

# simple processing
f <- function(dt){
  dt <- dt[1:nrow(dt),]
  dt[,new.row:=1]
  return (dt)
}

# reads parameters from file
csv <- read.table("io.csv")
infile  <- as.character(csv[1,1])
outfile <- as.character(csv[2,1])

# memory-hungry operations
dt <- as.data.table(read.csv(infile))
dt <- f(dt)
write.table(dt, outfile)

master.R - executes slaves in separate processes

setwd("/path/to")

# 3 files processing
for(i in 1:3){
  # sets iteration-specific parameters
  csv <- c("temp.csv", paste("temp", i, ".csv", sep=""))
  write.table(csv, "io.csv")

  # executes slave process
  system("R -f slave.R")
}

answered Sep 22 '22 08:09

redmode

Related questions
                            
                                Could we do backward elimination with mixed model using lmer
                            
                                geom_smooth in ggplot causes part of plot background to change colour
                            
                                How do I export a sorted factor loading table?
                            
                                Select rows without missing values in R
                            
                                merging data and receiving a big loss of data
                            
                                Create a column which increments based on another column in Python
                            
                                How can I color nodes and edges from an adjacency matrix in r?
                            
                                as.character usage on functions
                            
                                Adding a plane to a scatterplot3d
                            
                                How can I send selected text (or a line) in TextMate2 to R running on Terminal
                            
                                How to remove "Standard Error" column from xtable() output of an lm on R/RSweave/LaTeX
                            
                                Work with durations over 24 hours in R
                            
                                Looping through variable names in R
                            
                                What type of HTML table is this and what type of webscraping techniques can you use? [closed]
                            
                                Testthat fails when setting up rms by calling datadist() + options()
                            
                                Finding the names of all functions in an R expression
                            
                                glm and relative risk -replicate Stata code in R
                            
                                Creating new named variable in dataframe using loop and naming convention
                            
                                R-project filepath from concatenation
                            
                                Inverse function of an unknown cumulative function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With