Parallel Random Forests with doSMP and foreach drastically increase memory usage (on Windows)

Question

When executing random forest in serial it uses 8GB of RAM on my system, when doing it in parallel it uses more than twice te RAM (18GB). How can I keep it to 8GB when doing it in parallel? Here's the code:

install.packages('foreach')
install.packages('doSMP')
install.packages('randomForest')

library('foreach')
library('doSMP')
library('randomForest')

NbrOfCores <- 8 
workers <- startWorkers(NbrOfCores) # number of cores
registerDoSMP(workers)
getDoParName() # check name of parallel backend
getDoParVersion() # check version of parallel backend
getDoParWorkers() # check number of workers


#creating data and setting options for random forests
#if your run this please adapt it so it won't crash your system! This amount of data  uses up to 18GB of RAM.
x <- matrix(runif(500000), 100000)
y <- gl(2, 50000)
#options
set.seed(1)
ntree=1000
ntree2 <- ntree/NbrOfCores


gc()

#running serialized version of random forests

system.time(
rf1 <- randomForest(x, y, ntree = ntree))


gc()


#running parallel version of random forests

system.time(
rf2 <- foreach(ntree = rep(ntree2, 8), .combine = combine, .packages = "randomForest") %dopar% randomForest(x, y, ntree = ntree))

mbq · Accepted Answer

First of all, SMP will duplicate the input so that each process will get its own copy. This could be escaped by using multicore, yet there is also another problem -- each invocation of randomForest will also make an internal copy of the input.

The best you can do it to cut some usage by making randomForest drop the forest model itself (with keep.forest=FALSE) and doing testing along with training (by using xtest and possibly ytest arguments).

topepo · Answer

Random forest objects can get very large with moderate sized data sets, so the increase may be related to storing the model object.

To test this, you should really have two different sessions.

Try running another model in parallel that does not have a large footprint (lda for example) and see if you get the same increase in memory.

Parallel Random Forests with doSMP and foreach drastically increase memory usage (on Windows)

Tags:

memory

r

parallel-processing

random-forest

user1134616

2 Answers

mbq

topepo

Recent Activity

Donate For Us

Parallel Random Forests with doSMP and foreach drastically increase memory usage (on Windows)

Tags:

memory

r

parallel-processing

random-forest

user1134616

2 Answers

mbq

topepo

Related questions

Recent Activity

Donate For Us