Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use dplyr to truncate top and bottom percentiles of a numeric variable

Tags:

r

dplyr

I´ve generated a survey-weight. Because outlier survey-weights can lead to very big variances, i´m following a hint from many statistic books: I want to truncate the top 5% and bottom 5% of the survey weight. I would like to use dplyr for this.

#generate data
data<-as.data.frame(cbind(sequence(2000),rnorm(2000,mean=3.16,sd=1.355686))) 
names(data)<-c("id","weight")

#This is how far i got
data2<-data %>% mutate(perc.weight=percent_rank(weight)) %>%
                mutate(perc.weight>0.95 | perc.weight<0.05)

After this, i´ve got two new variables. The first Variable gives percent-ranks of the weights. The second variable shows, if a value exceeds the aimed range.

Now, i want to replace the weights which are in the 95-100 percentile and the weights within the 0-5 percentile with the weight-values that constitute the border of those percentiles.

I would be thankful for any help!

like image 963
SEMson Avatar asked Jan 14 '15 15:01

SEMson


People also ask

How do you get 25% quantile in R?

quantile() function in R Language is used to create sample quantiles within a data set with probability[0, 1]. Such as first quantile is at 0.25[25%], second is at 0.50[50%], and third is at 0.75[75%].

How do I create a percentile in R?

Using the quantiles function in R, you may calculate a percentile. It generates the percentage with the percentile value. The 0th percentile, 25th percentile, 50th percentile, 75th percentile, and 100th percentile are produced by this function's default form.

How do you find the percentile in R?

In R, we can use quantile() function to get the job done. Parameter: data: data whose percentiles are to be calculated. probs: percentile value.

What does the quantile function do in R?

Note: The quantile function divides the data into equal halves, in which the median acts as middle and over that the remaining lower part is lower quartile and upper part is upper quartile.


1 Answers

You can use the quantile function togehter with pmin, pmax:

data %>% mutate(weight_trunc = pmin(pmax(weight, quantile(weight, .05)), 
                                          quantile(weight, .95)))
like image 188
shadow Avatar answered Sep 29 '22 13:09

shadow