Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I store lm object in a data frame in R [duplicate]

Tags:

r

lm

I need to store lm fit object in a data frame for further processing (This is needed as I will have around 200+ regressions to be stored in the data frame). I am not able to store the fit object in the data frame. Following code produces the error message:

x = runif(100)
y = 2*x+runif(100)
fit = lm(y ~x)

df = data.frame()
df = rbind(df, c(id="xx1", fitObj=fit))

Error in rbind(deparse.level, ...) : 
  invalid list argument: all variables should have the same length

I would like to get the data frame as returned by "do" call of dplyr, example below:

> tacrSECOutput
Source: local data frame [24 x 5]
Groups: <by row>

                            sector control     id1     fit count
1  Chemicals and Chemical Products       S tSector <S3:lm>  2515
2     Construation and Real Estate       S tSector <S3:lm>   985

Please note that this is a sample output only. I would like to create the data frame (fit column for the lm object) in the above format so that my rest of the code can work on the added models.

What am I doing wrong? Appreciate the help very much.

like image 491
kishore Avatar asked Dec 17 '25 20:12

kishore


1 Answers

The list approach:

Clearly based on @Pascal 's idea. Not a fan of lists, but in some cases they are extremely helpful.

   set.seed(42)
x <- runif(100)
y <- 2*x+runif(100)
fit1 <- lm(y ~x)

set.seed(123)
x <- runif(100)
y <- 2*x+runif(100)
fit2 <- lm(y ~x)


# manually select model names
model_names = c("fit1","fit2")

# create a list based on models names provided
list_models = lapply(model_names, get)

# set names
names(list_models) = model_names

# check the output
list_models

# $fit1
# 
# Call:
#   lm(formula = y ~ x)
# 
# Coefficients:
#   (Intercept)            x  
#        0.5368       1.9678  
# 
# 
# $fit2
# 
# Call:
#   lm(formula = y ~ x)
# 
# Coefficients:
#   (Intercept)            x  
#        0.5545       1.9192 

Given that you have lots of models in your work space, the only "manual" thing you have to do is provide a vector of your models names (how are they stored) and then using the get function you can obtain the actual model objects with those names and save them in a list.


Store model objects in a dataset when you create them:

The data frame can be created using dplyr and do if you are planning to store the model objects when they are created.

library(dplyr)

set.seed(42)
x1 = runif(100)
y1 = 2*x+runif(100)

set.seed(123)
x2 <- runif(100)
y2 <- 2*x+runif(100)


model_formulas = c("y1~x1", "y2~x2")

data.frame(model_formulas, stringsAsFactors = F) %>%
  group_by(model_formulas) %>%
  do(model = lm(.$model_formulas))

#     model_formulas   model
#              (chr)   (chr)
#   1          y1~x1 <S3:lm>
#   2          y2~x2 <S3:lm>

It REALLY depends on how "organised" is the process that allows you to built those 200+ models you mentioned. You can build your models this way if they depend on columns of a specific dataset. It will not work if you want to build models based on various columns of different datasets, maybe of different work spaces or different model types (linear/logistic regression).


Store existing model objects in a dataset:

Actually I think you can still use dplyr using the same philosophy as in the list approach. If the models are already built you can use their names like this

library(dplyr)

set.seed(42)
x <- runif(100)
y <- 2*x+runif(100)
fit1 <- lm(y ~x)

set.seed(123)
x <- runif(100)
y <- 2*x+runif(100)
fit2 <- lm(y ~x)


# manually select model names
model_names = c("fit1","fit2")

data.frame(model_names, stringsAsFactors = F) %>%
  group_by(model_names) %>%
  do(model = get(.$model_names))


#   model_names   model
#         (chr)   (chr)
# 1        fit1 <S3:lm>
# 2        fit2 <S3:lm>
like image 60
AntoniosK Avatar answered Dec 20 '25 13:12

AntoniosK



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!