How to replace outliers with the 5th and 95th percentile values in R

Tags:

I'd like to replace all values in my relatively large R dataset which take values above the 95th and below the 5th percentile, with those percentile values respectively. My aim is to avoid simply cropping these outliers from the data entirely.

Any advice would be much appreciated, I can't find any information on how to do this anywhere else.

838

asked Nov 12 '12 07:11

Bobbo

2 Answers

You can do it in one line of code using squish():

Click to copy

d2 <- squish(d, quantile(d, c(.05, .95)))

In the scales library, look at ?squish and ?discard

Click to copy

#--------------------------------
library(scales)

pr <- .95
q  <- quantile(d, c(1-pr, pr))
d2 <- squish(d, q)
#---------------------------------

# Note: depending on your needs, you may want to round off the quantile, ie:
q <- round(quantile(d, c(1-pr, pr)))

example:

Click to copy

d <- 1:20
d
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20


d2 <- squish(d, round(quantile(d, c(.05, .95))))
d2
# [1]  2  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 19

152

answered Oct 16 '22 01:10

Ricardo Saporta

This would do it.

Click to copy

fun <- function(x){
    quantiles <- quantile( x, c(.05, .95 ) )
    x[ x < quantiles[1] ] <- quantiles[1]
    x[ x > quantiles[2] ] <- quantiles[2]
    x
}
fun( yourdata )

answered Oct 16 '22 02:10

Romain Francois

Related questions
                            
                                how to change vertical position of ggplot title without altering axis label justification
                            
                                remove or find NaN in R
                            
                                How to access map generated by leaflet in R
                            
                                Cache expensive operations in R
                            
                                How do I make a dummy variable in R?
                            
                                Filtering data in a dataframe based on criteria
                            
                                Compute monthly averages from daily data
                            
                                Rbind two vectors in R
                            
                                adding RMySQL package to R fails (on Windows)?
                            
                                Troubles installing "rgl" on Ubuntu
                            
                                Give name to list variable
                            
                                All the connections are in use: Execution halted
                            
                                How to create ascii-only tables as output in R, similar to MySQL style?
                            
                                Remove an element from a list that contains only NA?
                            
                                group by in R, ddply with weighted.mean
                            
                                Shifting a data frame in R
                            
                                How to write and execute a hello world program in file for R?
                            
                                How to get the average of two columns using dplyr?
                            
                                removing Figure text in rmarkdown
                            
                                Wrap text around plots in Markdown

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to replace outliers with the 5th and 95th percentile values in R

Tags:

r

dataset

outliers

quantile

Bobbo

People also ask

2 Answers

Ricardo Saporta

Romain Francois

Recent Activity

Donate For Us