Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find heavy objects that are not stored in .GlobalEnv?

I am trying to find which objects are taking a lot of memory in my R session, but the problem is that the object might have been invisibly created with an unknown name in an unknown environment.

If the object is stored in .GlobalEnv or a known environment, I can easily use a strategy like ls(enviro)+get()+object.size() (see lsos on this post for example) to list all objects and their size, allowing me to identify the heavy objects.

However, the object in question might not be stored in .GlobalEnv, but might be in some obscure environment implicitly created by an external package. How can in that case identify which object is using a lot of RAM?

The best case study is ggplot2 creating .last_plot in a dedicated environment. Looking under the hood one can find that it is stored in environment(ggplot2:::.store$get), so one can find it and eventually remove it. But if I didn't know that location or name a priori, would there be a way to find that there is a heavy object called .last_plot somewhere in memory?

pryr::mem_used()
#> 34.7 MB

## example: implicit creation of heavy and hidden object by ggplot
path <- tempfile() 
if(!file.exists(path)){
  saveRDS(as.data.frame(matrix(rep(1,1e07), ncol=5)), path)
}

pryr::mem_used()
#> 34.9 MB
p1 <- ggplot2::ggplot(readr::read_rds(path), ggplot2::aes(V1))
rm(p1)
pryr::mem_used()
#> 127 MB

## Hidden object is not in .GlobalEnv
ls(.GlobalEnv, all.names = TRUE)
#> [1] "path"

## Here I know where to find it: environment(ggplot2:::.store$get)
ls(all.names = TRUE, envir = environment(ggplot2:::.store$get))
#> [1] ".last_plot"

pryr::object_size(get(".last_plot", environment(ggplot2:::.store$get))$data)
#> 80 MB

## But how could I have found this otherwise?

Created on 2020-11-03 by the reprex package (v0.3.0)

like image 669
Matifou Avatar asked Nov 03 '20 20:11

Matifou


1 Answers

I don't think there's any existing way to do this. If you combine @AllanCameron's answer with my comment, where you'd also run ls(y) for y environments calculated as

ns <- loadedNamespaces()
for (x in ns) {
   y <- loadNamespace(x)
   # look at the size of everything in y
}

you still won't find all the environments. I think you could do it if you also examined every object that might contain a reference to an environment (e.g. every function, formula, list, and various exotic objects) but it would be tricky not to miss something or count things more than once.

Edited to add: Actually, pryr::object_size is pretty smart at reporting on the environments attached to objects, so we'd get close by searching namespaces. For example, to find the top 20 objects:

pryr::mem_used()
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp
#> 35 MB
path <- tempfile() 
if(!file.exists(path)){
  saveRDS(as.data.frame(matrix(rep(1,1e07), ncol=5)), path)
}
pryr::mem_used()
#> 35.2 MB
p1 <- ggplot2::ggplot(readr::read_rds(path), ggplot2::aes(V1))
rm(p1)
pryr::mem_used()
#> 127 MB
envs <- c(globalenv = globalenv(),
          sapply(loadedNamespaces(), function(ns) loadNamespace(ns)))
sizes <- lapply(envs, function(e) {
                        objs <- ls(e, all = TRUE)
                        sapply(objs, function(obj) pryr::object_size(get(obj, envir = e)))
                })
head(sort(unlist(sizes), decreasing = TRUE), 20)
#>       base..__S3MethodsTable__.      utils..__S3MethodsTable__. 
#>                        96216872                        83443704 
#>       grid..__S3MethodsTable__.    ggplot2..__S3MethodsTable__. 
#>                        80945520                        80636768 
#>                  ggplot2..store             methods..classTable 
#>                        80418936                        10101152 
#>   graphics..__S3MethodsTable__.           tools..check_packages 
#>                         9325608                         5185880 
#>         compiler.inlineHandlers           methods..genericTable 
#>                         3444600                         2808440 
#>         Rcpp..__T__show:methods   colorspace..__T__show:methods 
#>                         2474672                         2447880 
#>                 Rcpp..RcppClass Rcpp..__C__C++OverloadedMethods 
#>                         2127584                         1990504 
#>            Rcpp..__C__RcppClass             Rcpp..__C__C++Field 
#>                         1982576                         1980176 
#>       Rcpp..__C__C++Constructor               Rcpp..__T__$:base 
#>                         1979992                         1939616 
#>         tools..install_packages               Rcpp..__C__Module 
#>                         1904032                         1899872

Created on 2020-11-03 by the reprex package (v0.3.0)

I don't know why those methods tables come out so large (I suspect it's because ggplot2 adds methods to those tables, so its environment gets captured); but somehow they are finding your object, because they aren't so big if I don't create it.

A hint about the issue is in the 5th object, listed as ggplot2..store (i.e. the object named .store in the ggplot2 namespace). Doesn't tell you to look in the environments of the functions in .store, but at least it gets you started.

Second edit:

Here are some tweaks to make the output a bit more readable.

# Unlist first, so we can clean up the names
sizes <- unlist(sizes)

# Replace the first dot with :::
names(sizes) <- sub(".", ":::", names(sizes), fixed = TRUE)

# Remove internal R objects
keep <- !grepl(".__", names(sizes), fixed = TRUE)
sizes <- sizes[keep]

With these changes, the output from sort(sizes[keep], decreasing = TRUE) starts out as

                ggplot2:::.store 
                        80418936 
            base:::.userHooksEnv 
                        47855920 
                 base:::.Options 
                        45016888 
                   utils:::Rprof 
                        44958416 
like image 144
user2554330 Avatar answered Nov 11 '22 05:11

user2554330