Selecting the statistically significant variables in an R glm model

Tags:

glm

I have an outcome variable, say Y and a list of 100 dimensions that could affect Y (say X1...X100).

After running my glm and viewing a summary of my model, I see those variables that are statistically significant. I would like to be able to select those variables and run another model and compare performance. Is there a way I can parse the model summary and select only the ones that are significant?

725

asked Apr 22 '13 17:04

Pritish Kakodkar

1 Answers

Although @kith paved the way, there is more that can be done. Actually, the whole process can be automated. First, let's create some data:

x1 <- rnorm(10)
x2 <- rnorm(10)
x3 <- rnorm(10)
y <- rnorm(10)
x4 <- y + 5 # this will make a nice significant variable to test our code
(mydata <- as.data.frame(cbind(x1,x2,x3,x4,y)))

Our model is then:

model <- glm(formula=y~x1+x2+x3+x4,data=mydata)

And the Boolean vector of the coefficients can indeed be extracted by:

toselect.x <- summary(model)$coeff[-1,4] < 0.05 # credit to kith

But this is not all! In addition, we can do this:

# select sig. variables
relevant.x <- names(toselect.x)[toselect.x == TRUE] 
# formula with only sig variables
sig.formula <- as.formula(paste("y ~",relevant.x))

EDIT: as subsequent posters have pointed out, the latter line should be sig.formula <- as.formula(paste("y ~",paste(relevant.x, collapse= "+"))) to include all variables.

And run the regression with only significant variables as OP originally wanted:

sig.model <- glm(formula=sig.formula,data=mydata)

In this case the estimate will be equal to 1 as we have defined x4 as y+5, implying the perfect relationship.

153

answered Oct 13 '22 10:10

Maxim.K

Related questions
                            
                                r: for loop operation with nested indices runs super slow
                            
                                Circular plot in ggplot2 with line segments connected in r
                            
                                Stacking multiple plots, vertically with the same x axis but different Y axes in R
                            
                                How to load packages automatically when opening a project in RStudio
                            
                                Type-safety of the R language [closed]
                            
                                How to extract variable names from a netCDF file in R?
                            
                                Add rows to grouped data with dplyr?
                            
                                In R, why is sum so slow compared to others, such as cumsum?
                            
                                Error installing RMySQL (MySQL 5.5.37 in Ubuntu 14.04 )
                            
                                R/ Shiny - How to get current year with Sys.Date()?
                            
                                Width of R code chunk output in RMarkdown files knitr-ed to html
                            
                                Rename only if field exists, otherwise ignore
                            
                                Save a random Forest object
                            
                                How to produce a heatmap with ggplot2?
                            
                                How to annotate a reference line at the same angle as the reference line itself?
                            
                                Convert list to data frame while keeping list-element names
                            
                                How to aggregate some columns while keeping other columns in R?
                            
                                Loop through dataframe column names - R
                            
                                flextable autofit in a Rmarkdown to word doc causes table to go outside page margins
                            
                                Efficiently adding or removing elements to a vector or list in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With