I have a column in my dataframe in R, data$height. The values range from 0- 400. I want to normalize the values in the column such the resultant values lie between 0-1 and are quantiles, i.e the median value in the dataset should be reflecting 0.5 as the newer value.
Any guess on how to do this.
The R function ppoints is the usual way to map values into their percentile ranks.
See its a argument -
Setting a=1 takes the smallest value to 0 and the largest value to 1
Setting a=0 takes the smallest value to 1/(n+1) and the largest value to n/(n+1)
By default it has a=3/8 (if n is 10 or less) or a=1/2 (when n is larger than 10)
This function is used by other functions in R. For example it is called by qqnorm to do normal quantile-quantile plots.
You want some kind of rank, for example as in
> set.seed(1)
> exdf <- data.frame(height = runif(5, min=0, max=400))
> exdf$r1 <- (rank(exdf$height) - 1) / (length(exdf$height)-1)
> exdf$r2 <- (rank(exdf$height)-1/2) / length(exdf$height)
> exdf
height r1 r2
1 106.20347 0.25 0.3
2 148.84956 0.50 0.5
3 229.14135 0.75 0.7
4 363.28312 1.00 0.9
5 80.67277 0.00 0.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With