I want to monitor my memory usage in RStudio so that I can avoid getting out-of-memory errors on the cluster. I'm looking for a method to calculate peak memory usage that includes both global variables and local variables. For example, the peak memory usage should account for intermediate variables in functions and apply loops.
gc(reset = T)
sum(gc()[, "(Mb)"]) # 172Mb
lapply(1:3, function(x) {
mx <- replicate(10, rnorm(1e6)) # 80Mb object
mean(mx)
})
sum(gc()[, "(Mb)"]) # 172Mb -- still the same!
I found what I was looking for in the package peakRAM
. From the documentation:
This package makes it easy to monitor the total and peak RAM used so that developers can quickly identify and eliminate RAM hungry code.
mem <- peakRAM({
for(i in 1:5) {
mean(rnorm(1e7))
}
})
mem$Peak_RAM_Used_MiB # 10000486MiB
mem <- peakRAM({
for(i in 1:5) {
mean(rnorm(1e7))
}
})
mem$Peak_RAM_Used_MiB # 10005266MiB <-- almost the same!
The object returned by lapply
weights only 488 bytes because it's summarized : garbage collection has deleted the intermediate objects after mean calculation.help('Memory')
gives useful information on how R manages memory.
In particular, you can use object.size()
to follow-up size of individual objects, and memory.size()
to know how much total memory is used at each step :
# With mean calculation
gc(reset = T)
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 405777 21.7 831300 44.4 405777 21.7
#> Vcells 730597 5.6 8388608 64.0 730597 5.6
sum(gc()[, "(Mb)"])
#> [1] 27.3
l<-lapply(1:3, function(x) {
mx <- replicate(10, rnorm(1e6)) # 80Mb object
mean(mx)
print(paste('Memory used:',memory.size()))
})
#> [1] "Memory used: 271.04"
#> [1] "Memory used: 272.26"
#> [1] "Memory used: 272.26"
object.size(l)
#> 488 bytes
## Without mean calculation :
gc(reset = T)
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 464759 24.9 831300 44.4 464759 24.9
#> Vcells 864034 6.6 29994700 228.9 864034 6.6
gcinfo(T)
#> [1] FALSE
sum(gc()[, "(Mb)"])
#> [1] 31.5
l<-lapply(1:4, function(x) {
mx <- replicate(10, rnorm(1e6))
print(paste('New object size:',object.size(mx)))
print(paste('Memory used:',memory.size()))
mx
})
#> [1] "New object size: 80000216"
#> [1] "Memory used: 272.27"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 348.58"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 424.89"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 501.21"
object.size(l)
#> 320000944 bytes
sum(gc()[, "(Mb)"])
#> [1] 336.7
Created on 2020-08-20 by the reprex package (v0.3.0)
If instead of returning mean
you return the whole object, the increase in memory use is significant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With