Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to retrieve a list of the original variable names from a GLM call in R?

When using the glm function in R one can use functions like addNA or log inside the formula argument. Let's say we have a dataframe Data with 4 columns: Class, var1 which are factors and var2, var3 which are numeric variables and we fit:

Model <- glm(data  = Data, 
         formula   = Class ~ addNA(var1) + var2+ log(var3),  
         family    = binomial)

In the glm output variable 1 will now be called addNA(var1) (e.g. in Model$xlevels), while variable 3 will be called log(var3).

Is it possible to retrieve a list from the glm output that indicates that var1, var2 and var3 were extracted from the dataframe, without addNA(var1) or log(var3) appearing in the variable names?

More general, is it possible to infer which columns were extracted from the input dataframe by glm prior to any transformations / cross terms etc being generated inside the glm function, after the call to glm has been made?

like image 806
Herman Sontrop Avatar asked Feb 15 '23 02:02

Herman Sontrop


1 Answers

This works:

all.vars(formula(Model)[-2])
## [1] "var1" "var2" "var3"

The [-2] indexing removes the response variable from the formula. However, you may be disappointed that the internally stored model frame does not have the original variables, but the transformed variables ...

names(model.frame(Model))
## [1] "Class"       "addNA(var1)" "var2"        "log(var3)"  

If you want the raw names, then all.vars(getCall(Model)$formula) should work.

like image 153
Ben Bolker Avatar answered May 01 '23 02:05

Ben Bolker