Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly `dput` a fitted linear model (by `lm`) to an ASCII file and recreate it later?

I want to persist a lm object to a file and reload it into another program. I know I can do this by writing/reading a binary file via saveRDS/readRDS, but I'd like to have an ASCII file instead of a binary file. At a more general level, I'd like to know why my idioms for reading in dput output in general is not behaving as I'd expect.

Below are examples of making a simple fit, and successful and unsuccessful recreations of the model:

dat_train <- data.frame(x=1:4, z=c(1, 2.1, 2.9, 4))
fit <- lm(z ~ x, dat_train)
rm(dat_train) # Just to make sure fit is not dependent upon `dat_train existence`

dat_score <- data.frame(x=c(1.5, 3.5))

## This works (of course)
predict(fit, dat_score)
#    1    2 
# 1.52 3.48

Saving to binary file works:

## http://stackoverflow.com/questions/5118074/reusing-a-model-built-in-r
saveRDS(fit, "model.RDS")
fit2 <- readRDS("model.RDS")
predict(fit2, dat_score)
#    1    2 
# 1.52 3.48

So does this (dput it in the R session not to a file):

fit2 <- eval(dput(fit))
predict(fit2, dat_score)
#    1    2 
# 1.52 3.48

But if I persist file to disk, I cannot figure out how to get back into normal shape:

dput(fit, file = "model.R")
fit3 <- source("model.R")$value

# Error in is.data.frame(data): object 'dat_train' not found

predict(fit3, dat_score)
# Error in predict(fit3, dat_score): object 'fit3' not found

Trying to be explicit with the eval does not work either:

## http://stackoverflow.com/questions/9068397/import-text-file-as-single-character-string
dput(fit, file="model.R")
fit4 <- eval(parse(text=paste(readLines("model.R"), collapse=" ")))

# Error in is.data.frame(data): object 'dat_train' not found

predict(fit4, dat_score)
# Error in predict(fit4, dat_score): object 'fit4' not found

In both cases above, I expect fit3 and fit4 to both work, but they don't recompile into a lm object that I can use with predict().

Can anyone advise me on how I can persist a model to a file with a structure(...) ASCII-like structure, and then re-read it back in as a lm object I can use in predict()? And why my current methods are not working?

like image 297
mpettis Avatar asked Jan 13 '17 23:01

mpettis


1 Answers

This is an important update!

As mentioned in the previous answer, the most challenging bit is to recover $terms as best as we can. The suggested method using terms.formula works for OP's example, but not for the following with bs() and poly():

dat <- data.frame(x1 = runif(20), x2 = runif(20), x3 = runif(20), y = rnorm(20))
library(splines)
fit <- lm(y ~ bs(x1, df = 3) + poly(x2, degree = 3) + x3, data = dat)
rm(dat)

If we follow the previous answer:

dput(fit, control = c("quoteExpressions", "showAttributes"), file = "model.R") 
fit1 <- source("model.R")$value
fit1$terms <- terms.formula(fit1$terms)

We will see that summary.lm and anova.lm work correctly, but not predict.lm:

predict(fit1, newdata = data.frame(x1 = 0.5, x2 = 0.5, x3 = 0.5))

Error in bs(x1, df = 3) : could not find function "bs"

This is because ".Environment" attribute of $terms is missing. We need

environment(fit1$terms) <- .GlobalEnv

Now run above predict again we see a different error:

Error in poly(x2, degree = 3) :

'degree' must be less than number of unique points

This is because we are missing "predvars" attributes for safe / correct prediction of bs() and poly().

A remedy is that we need to dput such special attribute additionally:

dput(attr(fit$terms, "predvars"), control = "quoteExpressions", file = "predvars.R")

then read and add it

attr(fit1$terms, "predvars") <- source("predvars.R")$value

Now running predict works correctly.

Note that "dataClass" attribute of $terms is also missing, but this does not seem to cause any problem for any generic functions.

like image 89
Zheyuan Li Avatar answered Feb 16 '23 01:02

Zheyuan Li