How exactly are outliers removed in R boxplot and how can the same outliers be removed for further calculation (e.g. mean)?

Tags:

In a boxplot I've set the option outline=FALSE to remove the outliers.
Now I'd like to include points that show the mean in the boxplot. Obviously, the means calculated using mean include the outliers.

How can the very same outliers be removed from a dataframe so that the calculated mean corresponds to the data shown in the boxplot?

I know how outliers can be removed, but which settings are used by the outline option from boxplot internally? Unfortunately, the manual does not give any clarifications.

591

asked Nov 20 '14 09:11

Gnark

1 Answers

To answer the second part of your question, about how the outliers are choosen, it's good to remind how the boxplot is constructed:

the "body" of the boxplot corresponds to the second + third quartiles of the data (= interquartile range, IQR)
each whisker limit is generally calculated taking 1.5*IQR beyond the end of that body.

If you take the hypothesis that your data has a normal distribution, there are this amount of data outside each whisker:

1-pnorm(qnorm(0.75)+1.5*2*qnorm(0.75))

being 0.0035. Therefore, a normal variable has 0.7% of "boxplot outliers".

But this is not a very "reliable" way to detect outliers, there are packages specifically designed for this.

196

answered Sep 29 '22 22:09

agenis

Related questions
                            
                                File and directory structure of a r project
                            
                                Select first 80 observations for each level in R
                            
                                Plot the observed and fitted values from a linear regression using xyplot() from the lattice package
                            
                                Counting variables in a formula
                            
                                `rowname`-ing a list of matrices
                            
                                package ‘diamonds’ is not available (for R version 3.0.0) [duplicate]
                            
                                Need the filename of the Rmd when knitr runs
                            
                                Fill Geospatial polygons with pattern - R
                            
                                remove all words that start with "@" from a string
                            
                                Error: No Such Column using SQLDF
                            
                                How to edit colnames in R?
                            
                                How can I plot 3D function in r? [duplicate]
                            
                                Rolling Standard Deviation in a Matrix in R
                            
                                How to measure area between 2 distribution curves in R / ggplot2
                            
                                Using the result of summarise (dplyr) to mutate the original dataframe
                            
                                regex for preserving case pattern, capitalization
                            
                                Sleeping shinyapp on shinyapps.io
                            
                                How to match data from two tables with same primary key in R
                            
                                How can I write special characters in RMarkdown latex documents?
                            
                                Difference between runif and sample in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How exactly are outliers removed in R boxplot and how can the same outliers be removed for further calculation (e.g. mean)?

Tags:

r

outliers

mean

Gnark

People also ask

1 Answers

agenis

Recent Activity

Donate For Us