Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove outliers from a dataset

I've got some multivariate data of beauty vs ages. The ages range from 20-40 at intervals of 2 (20, 22, 24....40), and for each record of data, they are given an age and a beauty rating from 1-5. When I do boxplots of this data (ages across the X-axis, beauty ratings across the Y-axis), there are some outliers plotted outside the whiskers of each box.

I want to remove these outliers from the data frame itself, but I'm not sure how R calculates outliers for its box plots. Below is an example of what my data might look like. enter image description here

like image 546
Dan Q Avatar asked Jan 24 '11 21:01

Dan Q


People also ask

How do you clean up outliers in data?

We can calculate the mean and standard deviation of a given sample, then calculate the cut-off for identifying outliers as more than 3 standard deviations from the mean. We can then identify outliers as those examples that fall outside of the defined lower and upper limits.

How do you remove outliers from a data set in Excel?

Another easy way to eliminate outliers in Excel is, just sort the values of your dataset and manually delete the top and bottom values from it. To sort the data, Select the dataset. Go to Sort & Filter in the Editing group and pick either Sort Smallest to Largest or Sort Largest to Smallest.

How can outliers be detected and removed?

Outliers can be detected using visualization, implementing mathematical formulas on the dataset, or using the statistical approach. All of these are discussed below.

What is the 1.5 rule for outliers?

A commonly used rule says that a data point is an outlier if it is more than 1.5 ⋅ IQR 1.5\cdot \text{IQR} 1. 5⋅IQR1, point, 5, dot, start text, I, Q, R, end text above the third quartile or below the first quartile.


1 Answers

Nobody has posted the simplest answer:

x[!x %in% boxplot.stats(x)$out] 

Also see this: http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/

like image 100
J. Win. Avatar answered Oct 31 '22 13:10

J. Win.