Could someone help me interpret the alias function output for testing for multicollinearity in a multiple regression model. I know some predictor variables in my model are highly correlated, and I want to identify them using the alias table.
Model :
Score ~ Comments + Pros + Cons + Advice + Response + Value + Recommendation
+ 6Months + 12Months + 2Years + 3Years + Daily + Weekly + Monthly
Complete :
(Intercept) Comments Pros Cons Advice Response Value1
UseMonthly1 0 0 0 0 0 0 0
Recommendation1 6Months1 12Months1 2Years1
UseMonthly1 0 1 1 1
3Years1 Daily1 Weekly1
UseMonthly1 1 -1 -1
Value, Recommendation, 6Months, 12Months, 2Years, 3Years, Daily, Weekly, and Monthly are binary categorical variables.
Score, Comments, Pros, Cons, Advice, and Response are numeric variables.
Can I assume UseMonthly is highly correlated with 6Months, 12Months, 2Years, 3Years, Daily, Weekly? What is the difference between the 1 and -1 values in the alias output? Is it positive and negative correlation?
Nonzero entries in the "complete" matrix show that those terms are linearly dependent on UseMonthly
. This means they're highly correlated, but terms can be highly correlated without being linearly dependent.
If your purpose is to identify and remove correlated variables, you should remove UseMonthly
, but you'll probably also want to remove others as well. A common way to identify variables which can be problematic with respect to multicollinearity is to search for large variance inflation factors (calculated by e.g. car::vif
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With