Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Measure peak memory usage in R

Tags:

r

I want to monitor my memory usage in RStudio so that I can avoid getting out-of-memory errors on the cluster. I'm looking for a method to calculate peak memory usage that includes both global variables and local variables. For example, the peak memory usage should account for intermediate variables in functions and apply loops.

gc(reset = T)
sum(gc()[, "(Mb)"]) # 172Mb

lapply(1:3, function(x) {
  mx <- replicate(10, rnorm(1e6)) # 80Mb object
  mean(mx)
})

sum(gc()[, "(Mb)"]) # 172Mb -- still the same!
like image 312
Jeff Bezos Avatar asked Aug 11 '20 15:08

Jeff Bezos


2 Answers

I found what I was looking for in the package peakRAM. From the documentation:

This package makes it easy to monitor the total and peak RAM used so that developers can quickly identify and eliminate RAM hungry code.

mem <- peakRAM({
  for(i in 1:5) {
    mean(rnorm(1e7))
  }
})
mem$Peak_RAM_Used_MiB # 10000486MiB

mem <- peakRAM({
  for(i in 1:5) {
    mean(rnorm(1e7))
  }
})
mem$Peak_RAM_Used_MiB # 10005266MiB <-- almost the same!
like image 63
Jeff Bezos Avatar answered Oct 05 '22 11:10

Jeff Bezos


The object returned by lapply weights only 488 bytes because it's summarized : garbage collection has deleted the intermediate objects after mean calculation.
help('Memory') gives useful information on how R manages memory.
In particular, you can use object.size() to follow-up size of individual objects, and memory.size() to know how much total memory is used at each step :

# With mean calculation
gc(reset = T)
#>          used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 405777 21.7     831300 44.4   405777 21.7
#> Vcells 730597  5.6    8388608 64.0   730597  5.6
sum(gc()[, "(Mb)"]) 
#> [1] 27.3

l<-lapply(1:3, function(x) {
  mx <- replicate(10, rnorm(1e6)) # 80Mb object
  mean(mx)
  print(paste('Memory used:',memory.size()))
})
#> [1] "Memory used: 271.04"
#> [1] "Memory used: 272.26"
#> [1] "Memory used: 272.26"

object.size(l)
#> 488 bytes


## Without mean calculation :
gc(reset = T)
#>          used (Mb) gc trigger  (Mb) max used (Mb)
#> Ncells 464759 24.9     831300  44.4   464759 24.9
#> Vcells 864034  6.6   29994700 228.9   864034  6.6
gcinfo(T)
#> [1] FALSE
sum(gc()[, "(Mb)"]) 
#> [1] 31.5
l<-lapply(1:4, function(x) {
  mx <- replicate(10, rnorm(1e6))
  print(paste('New object size:',object.size(mx)))
  print(paste('Memory used:',memory.size()))
  mx
})
#> [1] "New object size: 80000216"
#> [1] "Memory used: 272.27"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 348.58"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 424.89"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 501.21"

object.size(l)
#> 320000944 bytes
sum(gc()[, "(Mb)"]) 
#> [1] 336.7

Created on 2020-08-20 by the reprex package (v0.3.0)

If instead of returning mean you return the whole object, the increase in memory use is significant.

like image 30
Waldi Avatar answered Oct 05 '22 10:10

Waldi