Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

VIFs returning aliased coefficients in R

I was wondering if anyone could help me with the following problem. When I conduct a VIF analysis between various explanatory variables it comes up with the following error messeage.

test <-vif(lm(Spring_Autumn ~ Oct + Nov + Dec + Jan + Feb +  
 Mar + Apr + May + Jun + Jul + Aug + Sep + X1min + X3min +   X7min + X30min + X90min + X1max + X3max + X7max + X30max + X90max + BF + Dmin + Dmax+ LP + LPD + HP + HPD + RR + FR + Rev, data = IHA_stats))


Error in vif.default(lm(Spring_Autumn ~ Oct + Nov + Dec + Jan + Feb +  : 
  there are aliased coefficients in the model

After reading online it would seem I have two variables that are perfectly collinear, but I couldn't see 2 variables perfectly correlated through the cor function and don't now how to interpret an alias function table. Does anyone have any suggestions? Thank you in advance.

James (a link to the original dataset is pasted below but can email if there are any issues with accessing this).

https://www.dropbox.com/s/nqmagu9m3mjhy9n/IHA_statistics.csv?dl=0

like image 729
James White Avatar asked Mar 05 '15 18:03

James White


People also ask

What are aliased coefficients R?

One error you may encounter in R is: Error in vif.default(model) : there are aliased coefficients in the model. This error typically occurs when multicollinearity exists in a regression model. That is, two or more predictor variables in the model are highly (or perfectly) correlated.

What are aliased coefficients in the model?

Having aliased coefficients in your model means that the square matrix X'X (where X is your design matrix) is singular, i.e., it has determinant of zero and is non-invertible. This is the classical problem of perfect multicollinearity.

What does the alias function do in R?

Description. . Alias creates an alias to another (part of) an R object which is more (memory-) efficient than usual assignment.


1 Answers

Use the 'alias' function in R to see which variables are linearly dependent. Remove the dependent variables and the vif function should work correctly.

formula <- as.formula(Spring_Autumn ~ Oct + Nov + Dec + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + X1min + X3min +   X7min + X30min + X90min + X1max + X3max + X7max + X30max + X90max + BF + Dmin + Dmax+ LP + LPD + HP + HPD + RR + FR + Rev, data = IHA_stats)
fit <-lm(formula)

#the linearly dependent variables
ld.vars <- attributes(alias(fit)$Complete)$dimnames[[1]]

#remove the linearly dependent variables variables
formula.new <- as.formula(
    paste(
        paste(deparse(formula), collapse=""), 
        paste(ld.vars, collapse="-"),
        sep="-"
    )
)

#run model again
fit.new <-lm(formula.new)
vif(fit.new)

NOTE: This will not work in the case that you have auto generated dummy variables that are identical to other variables. The variable names get messed up. You can create your own hack to get around it.

like image 96
Stewbaca Avatar answered Sep 30 '22 06:09

Stewbaca