I know there are plenty of similar questions with accepted answer (here, here or even this), but so far nowhere I found a clear answer on how to free some memory space without restarting your R session.
I know, one could save his workspace, restart R and load back the workspace but:
This doesn't seem to be the case. Even after removing half of the biggest objects of my workspace (thanks to this great answer) and after running gc()
, top
still gives me the exact same % of memory used.
Here in a comment it says:
R's garbage collection "marks" the RAM as available. Up to your OS to reclaim that
Sounds ok but not sure this really happen. top
still shows me the same amount of memory used by R even after rm()
and gc()
, and even after starting new other process in the os, and even after 2h, 10h or 3 days.
This comment suggest that it has to see with loaded libraries and graphic devices, but why? and how I can solve it?
If I rm()
a 3GB
object and then use gc()
to free the memory, how is it possible that R still use the same percentage of memory?
Using gc() function to remove all objects that are used from memory: gc() is used to remove all objects that are used from memory. reset is an optional parameter. It will return the maximum memory used in Mb.
You can do both by restarting your R session in RStudio with the keyboard shortcut Ctrl+Shift+F10 which will totally clear your global environment of both objects and loaded packages.
You can force R to perform this check, and free the memory right away, by running the gc() command in R or going to Tools -> Memory -> Free Unused R Memory. Read more about Garbage Collection in R.
R uses more memory probably because of some copying of objects. Although these temporary copies get deleted, R still occupies the space. To give this memory back to the OS you can call the gc function. However, when the memory is needed, gc is called automatically.
Garbage collection is "complicated". If x
is a variable bound in an environment e
, then rm(x, pos = e); gc()
does not necessarily free object.size(e$x)
bytes for use by the OS.
That is because R objects are just pointers to blocks of memory. If multiple objects point to the same block of memory, then you need to remove all of them to make that memory available for garbage collection. That can be hard to do if your global environment binds a large number of variables—possibly recursively, if you make frequent use of lists (including data frames), pairlists, and environments (including function evaluation environments).
Here is an example, which I've run on a machine with 8 GB RAM running Ubuntu 20.04. (It should be reproducible on most Unix-alikes, but not on Windows due to the Unix command in the system
call.)
$ R --vanilla
## Force garbage collection then output the amount of memory
## being used by R, as seen by R ('gc') and by the OS ('ps')
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.6
## Allocate a large block of memory and create multiple
## references to it
x <- double(1e+08)
y <- x
l <- list(x = x)
p <- pairlist(x = x)
e <- new.env(); e$x <- x
f <- (function(x) {force(x); function(x) x})(x)
usage()
## gc (MiB) ps (%)
## 786.1 10.3
## Apply 'object.size' to each object in current environment
## and scale from bytes to mebibytes
0x1p-20 * unlist(eapply(environment(), object.size))
## x y usage e f l p
## 7.629395e+02 7.629395e+02 1.787567e-02 5.340576e-05 1.106262e-03 7.629398e+02 7.629396e+02
## Remove references to 'double(1e+09)' one by one
rm(x); usage()
## gc (MiB) ps (%)
## 786.1 10.3
rm(y); usage()
## gc (MiB) ps (%)
## 786.1 10.3
l$x <- NULL; usage()
## gc (MiB) ps (%)
## 786.1 10.3
p$x <- NULL; usage()
## gc (MiB) ps (%)
## 786.1 10.3
rm(x, pos = e); usage()
## gc (MiB) ps (%)
## 786.1 10.3
rm(x, pos = environment(f)); usage()
## gc (MiB) ps (%)
## 23.2 0.6
This example shows that object.size
is not a reliable means of determining what variables you need to remove in order to return a certain block of memory to the OS. To actually free the ~760 MiB (~800 MB) allocated for double(1e+08)
, it was necessary to remove six references: x
, y
, l$x
, p$x
, e$x
, and environment(f)$x
.
Your observation that gc
appears to do nothing only in long-running R processes with many variables bound in the global environment makes me suspect that you have removed some but not all references to the blocks of memory that you are trying to free. I wouldn't jump to the conclusion that the garbage collector is behaving incorrectly, especially without a minimal reproducible example.
Issues with memory deallocation on Linux have been discussed on the R-devel mailing list and on Bugzilla. It is even covered in the R FAQ. Here are the most relevant links:
To summarize, it turns out that there is an issue on Linux, but it is due to a limitation of glibc that is outside of R's control. Specifically, when glibc allocates then deallocates many small blocks of memory, you can end up with a fragmented heap from which the OS is unable to reclaim unused memory.
We can reproduce the issue in R by creating a long list of short atomic vectors, rather than one very long atomic vector:
$ R --vanilla
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.6
x <- replicate(1e+06, runif(100), simplify = FALSE)
usage()
## gc (MiB) ps (%)
## 847.1 15.9
rm(x)
usage()
## gc (MiB) ps (%)
## 23.2 15.8
Indeed, the OS is unable to reclaim most of the memory that was occupied by x
and its elements. It continues to reserve ~15% of RAM for the R process, even though only ~23 MiB of that memory is used.
(That is on my Linux machine. On my Mac, which has twice as much RAM, the percentage memory used as reported by the OS changes from 0.4 to 6.2 to 1.2.)
A few work-arounds were suggested in the mailing list threads:
Set environment variables to tune the behaviour of glibc. No advice or example was provided, so you'll have to do a deep dive to figure this out. You might start with the mallopt
man-page.
Instruct R to use an allocator other than glibc's malloc
, such as jemalloc
or tcmalloc
. Luke Tierney wrote:
... it is possible to use alternate
malloc
implementations, either rebuilding R to use them or usingLD_PRELOAD
. On Ubuntu for example, you can have R usejemalloc
withsudo apt-get install libjemalloc1 env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 R
This does not seem to hold onto memory to the same degree, but I don't know about any other aspect of its performance.
Explicitly call the glibc utility malloc_trim
to instruct the OS to reclaim unused memory where possible. The malloc_trim
man-page says:
Since glibc 2.8 this function frees memory in all arenas and in all chunks with whole free pages.
which seems promising!
Dmitry Selivanov compared malloc
, jemalloc
, tcmalloc
, and malloc
+malloc_trim
here. They showed convincingly that all of jemalloc
, tcmalloc
, and malloc
+malloc_trim
can help mitigate fragmentation issues seen with malloc
. Some caveats:
malloc
alternatives is a panacea. They rarely performed worse than malloc
, but they did not always perform better.I retried the above replicate
example using each of the malloc
alternatives in turn. In this (nongeneralizable) experiment, jemalloc
and tcmalloc
did not perform much better than malloc
, while malloc
+malloc_trim
allowed the OS to reclaim all deallocated memory. Here are the libraries that I used:
libc6 version 2.31-0ubuntu9.2
libjemalloc2 version 5.2.1-1ubuntu1
libtcmalloc-minimal4 version 2.7-1ubuntu2
See below for results.
jemalloc
$ sudo apt install libjemalloc2
$ env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 R --vanilla
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.6
x <- replicate(1e+06, runif(100), simplify = FALSE)
usage()
## gc (MiB) ps (%)
## 847.1 13.9
rm(x)
usage()
## gc (MiB) ps (%)
## 23.2 9.4
tcmalloc
$ sudo apt install libtcmalloc-minimal4
$ env LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 R --vanilla
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.7
x <- replicate(1e+06, runif(100), simplify = FALSE)
usage()
## gc (MiB) ps (%)
## 847.1 13.8
rm(x)
usage()
## gc (MiB) ps (%)
## 23.2 13.8
malloc
+malloc_trim
, via Simon Urbanek's mallinfo::malloc.trim
$ R --vanilla
usage <- function() {
m1 <- sum(gc(FALSE)[, "(Mb)"])
m2 <- as.double(system(paste("ps -p", Sys.getpid(), "-o pmem="), intern = TRUE))
c(`gc (MiB)` = m1, `ps (%)` = m2)
}
usage()
## gc (MiB) ps (%)
## 19.0 0.7
x <- replicate(1e+06, runif(100), simplify = FALSE)
usage()
## gc (MiB) ps (%)
## 847.1 15.9
rm(x)
usage()
## gc (MiB) ps (%)
## 23.2 15.8
## install.packages("mallinfo", repos = "http://www.rforge.net/")
mallinfo::malloc.trim(0L)
usage()
## gc (MiB) ps (%)
## 23.2 0.6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With