I´ve generated a survey-weight. Because outlier survey-weights can lead to very big variances, i´m following a hint from many statistic books: I want to truncate the top 5% and bottom 5% of the survey weight. I would like to use dplyr for this.
#generate data
data<-as.data.frame(cbind(sequence(2000),rnorm(2000,mean=3.16,sd=1.355686)))
names(data)<-c("id","weight")
#This is how far i got
data2<-data %>% mutate(perc.weight=percent_rank(weight)) %>%
mutate(perc.weight>0.95 | perc.weight<0.05)
After this, i´ve got two new variables. The first Variable gives percent-ranks of the weights. The second variable shows, if a value exceeds the aimed range.
Now, i want to replace the weights which are in the 95-100 percentile and the weights within the 0-5 percentile with the weight-values that constitute the border of those percentiles.
I would be thankful for any help!
quantile() function in R Language is used to create sample quantiles within a data set with probability[0, 1]. Such as first quantile is at 0.25[25%], second is at 0.50[50%], and third is at 0.75[75%].
Using the quantiles function in R, you may calculate a percentile. It generates the percentage with the percentile value. The 0th percentile, 25th percentile, 50th percentile, 75th percentile, and 100th percentile are produced by this function's default form.
In R, we can use quantile() function to get the job done. Parameter: data: data whose percentiles are to be calculated. probs: percentile value.
Note: The quantile function divides the data into equal halves, in which the median acts as middle and over that the remaining lower part is lower quartile and upper part is upper quartile.
You can use the quantile
function togehter with pmin
, pmax
:
data %>% mutate(weight_trunc = pmin(pmax(weight, quantile(weight, .05)),
quantile(weight, .95)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With