How to retrieve a list of the original variable names from a GLM call in R?

Question

When using the glm function in R one can use functions like addNA or log inside the formula argument. Let's say we have a dataframe Data with 4 columns: Class, var1 which are factors and var2, var3 which are numeric variables and we fit:

Model <- glm(data  = Data, 
         formula   = Class ~ addNA(var1) + var2+ log(var3),  
         family    = binomial)

In the glm output variable 1 will now be called addNA(var1) (e.g. in Model$xlevels), while variable 3 will be called log(var3).

Is it possible to retrieve a list from the glm output that indicates that var1, var2 and var3 were extracted from the dataframe, without addNA(var1) or log(var3) appearing in the variable names?

More general, is it possible to infer which columns were extracted from the input dataframe by glm prior to any transformations / cross terms etc being generated inside the glm function, after the call to glm has been made?

Ben Bolker · Accepted Answer

This works:

all.vars(formula(Model)[-2])
## [1] "var1" "var2" "var3"

The [-2] indexing removes the response variable from the formula. However, you may be disappointed that the internally stored model frame does not have the original variables, but the transformed variables ...

names(model.frame(Model))
## [1] "Class"       "addNA(var1)" "var2"        "log(var3)"

If you want the raw names, then all.vars(getCall(Model)$formula) should work.

How to retrieve a list of the original variable names from a GLM call in R?

Tags:

r

glm

feature-selection

model-fitting

Herman Sontrop

1 Answers

Ben Bolker

Recent Activity

Donate For Us

How to retrieve a list of the original variable names from a GLM call in R?

Tags:

r

glm

feature-selection

model-fitting

Herman Sontrop

1 Answers

Ben Bolker

Related questions

Recent Activity

Donate For Us