Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to produce leverage stats?

Tags:

I know how to produce the plots using leveragePlot(), but I can not find a way to produce a statistic for leverage for each observation like in megastat output.

like image 456
Travis Avatar asked Feb 28 '12 04:02

Travis


People also ask

How do you calculate leverage in statistics?

Leverage measures how far away the data point is from the mean value. In general 1/n ≤ hi ≤ 1. Where there are k independent variables in the model, the mean value for leverage is (k+1)/n. A rule of thumb (Steven's) is that values 3 times this mean value are considered large.

What is leverage statistics?

In statistics and in particular in regression analysis, leverage is a measure of how far away the independent variable values of an observation are from those of the other observations. High-leverage points, if any, are outliers with respect to the independent variables.

What is considered high leverage in statistics?

A data point has high leverage if it has "extreme" predictor x values. With a single predictor, an extreme x value is simply one that is particularly high or low.


1 Answers

I think you're looking for the hat values.

Use hatvalues(fit). The rule of thumb is to examine any observations 2-3 times greater than the average hat value. I don't know of a specific function or package off the top of my head that provides this info in a nice data frame but doing it yourself is fairly straight forward. Here's an example:

fit <- lm(hp ~ cyl + mpg, data=mtcars) #a fake model  hatvalues(fit)  hv <- as.data.frame(hatvalues(fit)) mn <-mean(hatvalues(fit)) hv$warn <- ifelse(hv[, 'hatvalues(fit)']>3*mn, 'x3',    ifelse(hv[, 'hatvalues(fit)']>2*mn, 'x3', '-' ))  hv 

For larger data sets you could use subset and/or orderto look at just certain values ranges for the hat values:

subset(hv, warn=="x3") subset(hv, warn%in%c("x2", "x3")) hv[order(hv['hatvalues(fit)']), ] 

I actually came across a nice plot function that does this in the book R in Action but as this is a copyrighted book I will not display Kabacoff's intellectual property. But that plot would work even better for mid sized data sets.

Here is a decent hat plot though that you may also want to investigate:

plot(hatvalues(fit), type = "h") 
like image 122
Tyler Rinker Avatar answered Nov 15 '22 16:11

Tyler Rinker