Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

readRDS() loads extra packages

Under what circumstances does the readRDS() function in R try to load packages/namespaces? I was surprised to see the following in a fresh R session:

> loadedNamespaces()
[1] "base"      "datasets"  "graphics"  "grDevices" "methods"   "stats"    
[7] "tools"     "utils"    
> x <- readRDS('../../../../data/models/my_model.rds')
There were 19 warnings (use warnings() to see them)
> loadedNamespaces()
 [1] "base"         "class"        "colorspace"   "data.table"  
 [5] "datasets"     "dichromat"    "e1071"        "earth"       
 [9] "evaluate"     "fields"       "formatR"      "gbm"         
[13] "ggthemes"     "graphics"     "grDevices"    "grid"        
[17] "Iso"          "knitr"        "labeling"     "lattice"     
[21] "lubridate"    "MASS"         "methods"      "munsell"     
[25] "plotmo"       "plyr"         "proto"        "quantreg"    
[29] "randomForest" "RColorBrewer" "reshape2"     "rJava"       
[33] "scales"       "spam"         "SparseM"      "splines"     
[37] "stats"        "stringr"      "survival"     "tools"       
[41] "utils"        "wra"          "wra.ops"      "xlsx"        
[45] "xlsxjars"     "xts"          "zoo"     

If any of those new packages aren't available, the readRDS() fails.

The 19 warnings mentioned are:

> warnings()
Warning messages:
1: replacing previous import ‘hour’ when loading ‘data.table’
2: replacing previous import ‘last’ when loading ‘data.table’
3: replacing previous import ‘mday’ when loading ‘data.table’
4: replacing previous import ‘month’ when loading ‘data.table’
5: replacing previous import ‘quarter’ when loading ‘data.table’
6: replacing previous import ‘wday’ when loading ‘data.table’
7: replacing previous import ‘week’ when loading ‘data.table’
8: replacing previous import ‘yday’ when loading ‘data.table’
9: replacing previous import ‘year’ when loading ‘data.table’
10: replacing previous import ‘here’ when loading ‘plyr’
11: replacing previous import ‘hour’ when loading ‘data.table’
12: replacing previous import ‘last’ when loading ‘data.table’
13: replacing previous import ‘mday’ when loading ‘data.table’
14: replacing previous import ‘month’ when loading ‘data.table’
15: replacing previous import ‘quarter’ when loading ‘data.table’
16: replacing previous import ‘wday’ when loading ‘data.table’
17: replacing previous import ‘week’ when loading ‘data.table’
18: replacing previous import ‘yday’ when loading ‘data.table’
19: replacing previous import ‘year’ when loading ‘data.table’

So apparently it's loading something like lubridate and then data.table, generating namespace conflicts as it goes.

FWIW, unserialize() gives the same results.

What I really want is to load these objects without also loading everything the person who saved them seemed to have loaded at the time, which is what it sort of looks like it's doing.

Update: here are the classes in the object x:

> classes <- function(x) {
    cl <- c()
    for(i in x) {
      cl <- c(cl, if(is.list(i)) c(class(i), classes(i)) else class(i))
    }
    cl
  }
> unique(classes(x))
 [1] "list"              "numeric"           "rq"               
 [4] "terms"             "formula"           "call"             
 [7] "character"         "smooth.spline"     "integer"          
[10] "smooth.spline.fit"

qr is from the quantreg package, all the rest are from base or stats.

like image 651
Ken Williams Avatar asked Oct 02 '13 20:10

Ken Williams


2 Answers

Ok. This may not be a useful answer (which would need more details) but I think it is at least an aswer to the "under what circumstances.." part.

First of all, I think it is not specific to readRDS but works the same way with any save'd objects that can be load'ed.

The "under what circumstances" part: when the saved object contains an environment having a package/namespace environment as a parent. Or when it contains a function whose environment is a package/namespace environment.

require(Matrix)
foo <- list(
   a = 1,
   b = new.env(parent=environment(Matrix)),
   c = "c")
save(foo, file="foo.rda")
loadedNamespaces()   # Matrix is there!
detach("package:Matrix")
unloadNamespace("Matrix")
loadedNamespaces()   # no Matrix there!
load("foo.rda")
loadedNamespaces()   # Matrix is back again

And the following works too:

require(Matrix)
bar <- list(
   a = 1,
   b = force,
   c = "c")
environment(bar$b) <- environment(Matrix)
save(bar, file="bar.rda")
loadedNamespaces()      # Matrix is there!
detach("package:Matrix")
unloadNamespace("Matrix")
loadedNamespaces()      # no Matrix there!
load("bar.rda")
loadedNamespaces()      # Matrix is back!

I haven't tried but there's no reason why it shouldn't work the same way with saveRDS/readRDS. And the solution: if that does no harm to the saved objects (i.e., if you're sure that the environments are actually not needed), you can remove the parent environments by replacing them e.g. by setting the parent.env to something that makes sense. So using the foo above,

parent.env(foo$b) <- baseenv()
save(foo, file="foo.rda")
loadedNamespaces()        # Matrix is there ....
unloadNamespace("Matrix")
loadedNamespaces()        # no Matrix there ...
load("foo.rda")
loadedNamespaces()        # still no Matrix ...
like image 181
lebatsnok Avatar answered Oct 17 '22 20:10

lebatsnok


One painful workaround I've come up with is to cleanse the object of any environments it had attached to it, by a nasty eval:

sanitizeEnvironments <- function(obj) {
    tc <- textConnection(NULL, 'w')
    dput(obj, tc)
    source(textConnection(textConnectionValue(tc)))$value
}

I can take the old object, run it through this function, then do saveRDS() on it again. Then loading the new object doesn't blow chunks all over my namespace.

like image 38
Ken Williams Avatar answered Oct 17 '22 20:10

Ken Williams