I see a lot of questions and answers re order
and sort
. Is there anything that sorts vectors or data frames into groupings (like quartiles or deciles)? I have a "manual" solution, but there's likely a better solution that has been group-tested.
Here's my attempt:
temp <- data.frame(name=letters[1:12], value=rnorm(12), quartile=rep(NA, 12)) temp # name value quartile # 1 a 2.55118169 NA # 2 b 0.79755259 NA # 3 c 0.16918905 NA # 4 d 1.73359245 NA # 5 e 0.41027113 NA # 6 f 0.73012966 NA # 7 g -1.35901658 NA # 8 h -0.80591167 NA # 9 i 0.48966739 NA # 10 j 0.88856758 NA # 11 k 0.05146856 NA # 12 l -0.12310229 NA temp.sorted <- temp[order(temp$value), ] temp.sorted$quartile <- rep(1:4, each=12/4) temp <- temp.sorted[order(as.numeric(rownames(temp.sorted))), ] temp # name value quartile # 1 a 2.55118169 4 # 2 b 0.79755259 3 # 3 c 0.16918905 2 # 4 d 1.73359245 4 # 5 e 0.41027113 2 # 6 f 0.73012966 3 # 7 g -1.35901658 1 # 8 h -0.80591167 1 # 9 i 0.48966739 3 # 10 j 0.88856758 4 # 11 k 0.05146856 2 # 12 l -0.12310229 1
Is there a better (cleaner/faster/one-line) approach? Thanks!
To calculate a quartile in R, set the percentile as parameter of the quantile function. You can use many of the other features of the quantile function which we described in our guide on how to calculate percentile in R.
To find the decile, first order the data from least to greatest. Then, divide the data by 10. This indicates the number of observed values within each decile.
There's a handy ntile
function in package dplyr
. It's flexible in the sense that you can very easily define the number of *tiles or "bins" you want to create.
Load the package (install first if you haven't) and add the quartile column:
library(dplyr) temp$quartile <- ntile(temp$value, 4)
Or, if you want to use dplyr syntax:
temp <- temp %>% mutate(quartile = ntile(value, 4))
Result in both cases is:
temp # name value quartile #1 a -0.56047565 1 #2 b -0.23017749 2 #3 c 1.55870831 4 #4 d 0.07050839 2 #5 e 0.12928774 3 #6 f 1.71506499 4 #7 g 0.46091621 3 #8 h -1.26506123 1 #9 i -0.68685285 1 #10 j -0.44566197 2 #11 k 1.22408180 4 #12 l 0.35981383 3
Note that you don't need to create the "quartile" column in advance and use set.seed
to make the randomization reproducible:
set.seed(123) temp <- data.frame(name=letters[1:12], value=rnorm(12))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With