Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace outliers with the 5th and 95th percentile values in R

I'd like to replace all values in my relatively large R dataset which take values above the 95th and below the 5th percentile, with those percentile values respectively. My aim is to avoid simply cropping these outliers from the data entirely.

Any advice would be much appreciated, I can't find any information on how to do this anywhere else.

like image 838
Bobbo Avatar asked Nov 12 '12 07:11

Bobbo


People also ask

How do you replace outliers in data?

Use Mean Detection and Nearest Fill Methods Fill outliers in the data, where an outlier is defined as a point more than three standard deviations from the mean. Replace the outlier with the nearest element that is not an outlier. In the same graph, plot the original data and the data with the outlier filled.

Can you replace outliers with the mean?

One can identify all "outliers" at once and replace all of them with the mean of the remainder. This is a consistent procedure not unlike Winsorizing. You argue against replacing outliers with a value that is dependent on the other values in the data.

What are outliers in percentile?

An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal.


2 Answers

You can do it in one line of code using squish():

d2 <- squish(d, quantile(d, c(.05, .95)))



In the scales library, look at ?squish and ?discard

#--------------------------------
library(scales)

pr <- .95
q  <- quantile(d, c(1-pr, pr))
d2 <- squish(d, q)
#---------------------------------

# Note: depending on your needs, you may want to round off the quantile, ie:
q <- round(quantile(d, c(1-pr, pr)))

example:

d <- 1:20
d
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20


d2 <- squish(d, round(quantile(d, c(.05, .95))))
d2
# [1]  2  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 19
like image 152
Ricardo Saporta Avatar answered Oct 16 '22 01:10

Ricardo Saporta


This would do it.

fun <- function(x){
    quantiles <- quantile( x, c(.05, .95 ) )
    x[ x < quantiles[1] ] <- quantiles[1]
    x[ x > quantiles[2] ] <- quantiles[2]
    x
}
fun( yourdata )
like image 26
Romain Francois Avatar answered Oct 16 '22 02:10

Romain Francois