How does plot.lm() determine outliers for residual vs fitted plot?

Tags:

How does plot.lm() determine what points are outliers (that is, what points to label) for residual vs fitted plot? The only thing I found in the documentation is this:

Details

sub.caption—by default the function call—is shown as a subtitle (under the x-axis title) on each plot when plots are on separate pages, or as a subtitle in the outer margin (if any) when there are multiple plots per page.

The ‘Scale-Location’ plot, also called ‘Spread-Location’ or ‘S-L’ plot, takes the square root of the absolute residuals in order to diminish skewness (sqrt(|E|)) is much less skewed than | E | for Gaussian zero-mean E).

The ‘S-L’, the Q-Q, and the Residual-Leverage plot, use standardized residuals which have identical variance (under the hypothesis). They are given as R[i] / (s * sqrt(1 - h.ii)) where h.ii are the diagonal entries of the hat matrix, influence()$hat (see also hat), and where the Residual-Leverage plot uses standardized Pearson residuals (residuals.glm(type = "pearson")) for R[i].

The Residual-Leverage plot shows contours of equal Cook's distance, for values of cook.levels (by default 0.5 and 1) and omits cases with leverage one with a warning. If the leverages are constant (as is typically the case in a balanced aov situation) the plot uses factor level combinations instead of the leverages for the x-axis. (The factor levels are ordered by mean fitted value.)

In the Cook's distance vs leverage/(1-leverage) plot, contours of standardized residuals that are equal in magnitude are lines through the origin. The contour lines are labelled with the magnitudes.

But it says nothing about how residuals vs fitted plot was generated and how it chooses what points to label.

Update: Zheyuan Li's answer suggests that the way residual vs fitted plot labels the points is, really, simply by looking at the 3 points with largest residuals. This is indeed the case. It can be demonstrated by the following "extreme" example.

x = c(1,2,3,4,5,6)
y = c(2,4,6,8,10,12)
foo = data.frame(x,y)
model = lm(y ~ x, data = foo)

enter image description here

628

asked Aug 31 '16 21:08

3x89g2

1 Answers

They locate the largest 3 absolute standardised residuals. Consider this example:

fit <- lm(dist ~ speed, cars)
plot(fit, which = 1)

enter image description here

r <- rstandard(fit)  ## get standardised residuals
order(abs(r), decreasing = TRUE)[1:3]
# [1] 49 23 35

187

answered Oct 19 '22 17:10

Zheyuan Li

Related questions
                            
                                R: adding alpha bags to a 2d or 3d scatterplot
                            
                                Format ttest output by r for tex
                            
                                Using gtools::mixedsort or alternatives with dplyr::arrange
                            
                                check whether matrix rows equal a vector in R , vectorized
                            
                                In R Merging rows where a column has same value but different case
                            
                                How to impute values in a data.table by groups?
                            
                                Parallel wilcox.test using group_by and summarise
                            
                                Mutate data conditionally in dplyr
                            
                                Find dates that fail to parse in R Lubridate
                            
                                Incorporating time series into a mixed effects model in R (using lme4)
                            
                                Is it possible to include custom css in htmlwidgets for R and/or LeafletR?
                            
                                gganimate issue with geom_bar?
                            
                                Creating indicator variable columns in dplyr chain
                            
                                how to plot a figure with specific distance between each line
                            
                                How to increase the font size of label names
                            
                                Counting unique days with overlap and gaps in date ranges
                            
                                R: dplyr group by date range
                            
                                Best practices to alert users of package vignettes when `library(packagename)` is loaded? [closed]
                            
                                Ctrl + Shift + D doesn't run documentation routines
                            
                                Edit labels in tooltip for plotly maps using ggplot2 in r

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does plot.lm() determine outliers for residual vs fitted plot?

Tags:

plot

r

linear-regression

regression

lm

3x89g2

People also ask

1 Answers

Zheyuan Li

Recent Activity

Donate For Us