Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How exactly are outliers removed in R boxplot and how can the same outliers be removed for further calculation (e.g. mean)?

Tags:

r

outliers

mean

In a boxplot I've set the option outline=FALSE to remove the outliers.
Now I'd like to include points that show the mean in the boxplot. Obviously, the means calculated using mean include the outliers.

How can the very same outliers be removed from a dataframe so that the calculated mean corresponds to the data shown in the boxplot?

I know how outliers can be removed, but which settings are used by the outline option from boxplot internally? Unfortunately, the manual does not give any clarifications.

like image 591
Gnark Avatar asked Nov 20 '14 09:11

Gnark


People also ask

How do you remove outliers from a Boxplot in R?

We can remove outliers in R by setting the outlier. shape argument to NA. In addition, the coord_cartesian() function will be used to reject all outliers that exceed or below a given quartile. The y-axis of ggplot2 is not automatically adjusted.

How can outliers be removed?

We can calculate the mean and standard deviation of a given sample, then calculate the cut-off for identifying outliers as more than 3 standard deviations from the mean. We can then identify outliers as those examples that fall outside of the defined lower and upper limits.


1 Answers

To answer the second part of your question, about how the outliers are choosen, it's good to remind how the boxplot is constructed:

  • the "body" of the boxplot corresponds to the second + third quartiles of the data (= interquartile range, IQR)
  • each whisker limit is generally calculated taking 1.5*IQR beyond the end of that body.

If you take the hypothesis that your data has a normal distribution, there are this amount of data outside each whisker:

1-pnorm(qnorm(0.75)+1.5*2*qnorm(0.75))

being 0.0035. Therefore, a normal variable has 0.7% of "boxplot outliers".

But this is not a very "reliable" way to detect outliers, there are packages specifically designed for this.

like image 196
agenis Avatar answered Sep 29 '22 22:09

agenis