Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is `ecdf` saving its object? (and how to measure it?)

Tags:

memory-leaks

r

I can't seem to understand where R saves the data for ecdf. Here is some code to illustrate this:

> set.seed(2016-10-30)
> x <- rnorm(1e4)
> y <- ecdf(x)
> object.size(x)
80040 bytes
> object.size(y)
3896 bytes
> rm(x)
> gc()
          used (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells  602079 32.2    1168576   62.5    750400   40.1
Vcells 1183188  9.1  299644732 2286.2 750532746 5726.2
> object.size(y)
3896 bytes
> plot(y) # still works...
> 

If the size of y is small, it means the data is saved somewhere. It is obviously not saved in x (as I removed it).

  1. It is probably in some environment, but how would we access it? So where is this data saved, and how can it be accessed?
  2. How would this effect memory.limit() ? (i.e.: caching or memory limits of running R processes)
like image 205
Tal Galili Avatar asked Oct 30 '16 14:10

Tal Galili


1 Answers

There is a fantastic explanation of function closures, the enclosing, executing and calling environments in @hadley's Advanced R.

For your specific example, as noted in the comments, the size of the object, together with its enclosing environment is much larger:

pryr::compare_size(y)

You can see the objects that this entails, and their relative sizes using this:

sapply(codetools::findGlobals(y), function(x) object.size(get(x, environment(y))))

You can sum the last vector to see that this is indeed what pryr::object_size is reporting (164 kB on my machine).

like image 59
tchakravarty Avatar answered Oct 11 '22 17:10

tchakravarty