I want to persist a lm
object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS
/readRDS
, but I'd like to have an ASCII file instead of a binary file. At a more general level, I'd like to know why my idioms for reading in dput
output in general is not behaving as I'd expect.
Below are examples of making a simple fit, and successful and unsuccessful recreations of the model:
dat_train <- data.frame(x=1:4, z=c(1, 2.1, 2.9, 4))
fit <- lm(z ~ x, dat_train)
rm(dat_train) # Just to make sure fit is not dependent upon `dat_train existence`
dat_score <- data.frame(x=c(1.5, 3.5))
## This works (of course)
predict(fit, dat_score)
# 1 2
# 1.52 3.48
Saving to binary file works:
## http://stackoverflow.com/questions/5118074/reusing-a-model-built-in-r
saveRDS(fit, "model.RDS")
fit2 <- readRDS("model.RDS")
predict(fit2, dat_score)
# 1 2
# 1.52 3.48
So does this (dput
it in the R session not to a file):
fit2 <- eval(dput(fit))
predict(fit2, dat_score)
# 1 2
# 1.52 3.48
But if I persist file to disk, I cannot figure out how to get back into normal shape:
dput(fit, file = "model.R")
fit3 <- source("model.R")$value
# Error in is.data.frame(data): object 'dat_train' not found
predict(fit3, dat_score)
# Error in predict(fit3, dat_score): object 'fit3' not found
Trying to be explicit with the eval
does not work either:
## http://stackoverflow.com/questions/9068397/import-text-file-as-single-character-string
dput(fit, file="model.R")
fit4 <- eval(parse(text=paste(readLines("model.R"), collapse=" ")))
# Error in is.data.frame(data): object 'dat_train' not found
predict(fit4, dat_score)
# Error in predict(fit4, dat_score): object 'fit4' not found
In both cases above, I expect fit3
and fit4
to both work, but they don't recompile into a lm
object that I can use with predict()
.
Can anyone advise me on how I can persist a model to a file with a structure(...)
ASCII-like structure, and then re-read it back in as a lm
object I can use in predict()
? And why my current methods are not working?
This is an important update!
As mentioned in the previous answer, the most challenging bit is to recover $terms
as best as we can. The suggested method using terms.formula
works for OP's example, but not for the following with bs()
and poly()
:
dat <- data.frame(x1 = runif(20), x2 = runif(20), x3 = runif(20), y = rnorm(20))
library(splines)
fit <- lm(y ~ bs(x1, df = 3) + poly(x2, degree = 3) + x3, data = dat)
rm(dat)
If we follow the previous answer:
dput(fit, control = c("quoteExpressions", "showAttributes"), file = "model.R")
fit1 <- source("model.R")$value
fit1$terms <- terms.formula(fit1$terms)
We will see that summary.lm
and anova.lm
work correctly, but not predict.lm
:
predict(fit1, newdata = data.frame(x1 = 0.5, x2 = 0.5, x3 = 0.5))
Error in bs(x1, df = 3) : could not find function "bs"
This is because ".Environment"
attribute of $terms
is missing. We need
environment(fit1$terms) <- .GlobalEnv
Now run above predict
again we see a different error:
Error in poly(x2, degree = 3) :
'degree' must be less than number of unique points
This is because we are missing "predvars"
attributes for safe / correct prediction of bs()
and poly()
.
A remedy is that we need to dput
such special attribute additionally:
dput(attr(fit$terms, "predvars"), control = "quoteExpressions", file = "predvars.R")
then read and add it
attr(fit1$terms, "predvars") <- source("predvars.R")$value
Now running predict
works correctly.
Note that "dataClass"
attribute of $terms
is also missing, but this does not seem to cause any problem for any generic functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With