I was wondering if anyone could help me with the following problem. When I conduct a VIF analysis between various explanatory variables it comes up with the following error messeage.
test <-vif(lm(Spring_Autumn ~ Oct + Nov + Dec + Jan + Feb +
Mar + Apr + May + Jun + Jul + Aug + Sep + X1min + X3min + X7min + X30min + X90min + X1max + X3max + X7max + X30max + X90max + BF + Dmin + Dmax+ LP + LPD + HP + HPD + RR + FR + Rev, data = IHA_stats))
Error in vif.default(lm(Spring_Autumn ~ Oct + Nov + Dec + Jan + Feb + :
there are aliased coefficients in the model
After reading online it would seem I have two variables that are perfectly collinear, but I couldn't see 2 variables perfectly correlated through the cor function and don't now how to interpret an alias function table. Does anyone have any suggestions? Thank you in advance.
James (a link to the original dataset is pasted below but can email if there are any issues with accessing this).
https://www.dropbox.com/s/nqmagu9m3mjhy9n/IHA_statistics.csv?dl=0
One error you may encounter in R is: Error in vif.default(model) : there are aliased coefficients in the model. This error typically occurs when multicollinearity exists in a regression model. That is, two or more predictor variables in the model are highly (or perfectly) correlated.
Having aliased coefficients in your model means that the square matrix X'X (where X is your design matrix) is singular, i.e., it has determinant of zero and is non-invertible. This is the classical problem of perfect multicollinearity.
Description. . Alias creates an alias to another (part of) an R object which is more (memory-) efficient than usual assignment.
Use the 'alias' function in R to see which variables are linearly dependent. Remove the dependent variables and the vif function should work correctly.
formula <- as.formula(Spring_Autumn ~ Oct + Nov + Dec + Jan + Feb + Mar + Apr + May + Jun + Jul + Aug + Sep + X1min + X3min + X7min + X30min + X90min + X1max + X3max + X7max + X30max + X90max + BF + Dmin + Dmax+ LP + LPD + HP + HPD + RR + FR + Rev, data = IHA_stats)
fit <-lm(formula)
#the linearly dependent variables
ld.vars <- attributes(alias(fit)$Complete)$dimnames[[1]]
#remove the linearly dependent variables variables
formula.new <- as.formula(
paste(
paste(deparse(formula), collapse=""),
paste(ld.vars, collapse="-"),
sep="-"
)
)
#run model again
fit.new <-lm(formula.new)
vif(fit.new)
NOTE: This will not work in the case that you have auto generated dummy variables that are identical to other variables. The variable names get messed up. You can create your own hack to get around it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With