Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to quickly form groups (quartiles, deciles, etc) by ordering column(s) in a data frame

I see a lot of questions and answers re order and sort. Is there anything that sorts vectors or data frames into groupings (like quartiles or deciles)? I have a "manual" solution, but there's likely a better solution that has been group-tested.

Here's my attempt:

temp <- data.frame(name=letters[1:12], value=rnorm(12), quartile=rep(NA, 12)) temp #    name       value quartile # 1     a  2.55118169       NA # 2     b  0.79755259       NA # 3     c  0.16918905       NA # 4     d  1.73359245       NA # 5     e  0.41027113       NA # 6     f  0.73012966       NA # 7     g -1.35901658       NA # 8     h -0.80591167       NA # 9     i  0.48966739       NA # 10    j  0.88856758       NA # 11    k  0.05146856       NA # 12    l -0.12310229       NA temp.sorted <- temp[order(temp$value), ] temp.sorted$quartile <- rep(1:4, each=12/4) temp <- temp.sorted[order(as.numeric(rownames(temp.sorted))), ] temp #    name       value quartile # 1     a  2.55118169        4 # 2     b  0.79755259        3 # 3     c  0.16918905        2 # 4     d  1.73359245        4 # 5     e  0.41027113        2 # 6     f  0.73012966        3 # 7     g -1.35901658        1 # 8     h -0.80591167        1 # 9     i  0.48966739        3 # 10    j  0.88856758        4 # 11    k  0.05146856        2 # 12    l -0.12310229        1 

Is there a better (cleaner/faster/one-line) approach? Thanks!

like image 674
Richard Herron Avatar asked Nov 08 '10 17:11

Richard Herron


People also ask

How do you make quartiles in R?

To calculate a quartile in R, set the percentile as parameter of the quantile function. You can use many of the other features of the quantile function which we described in our guide on how to calculate percentile in R.

How do you find the deciles given the set of data?

To find the decile, first order the data from least to greatest. Then, divide the data by 10. This indicates the number of observed values within each decile.


1 Answers

There's a handy ntile function in package dplyr. It's flexible in the sense that you can very easily define the number of *tiles or "bins" you want to create.

Load the package (install first if you haven't) and add the quartile column:

library(dplyr) temp$quartile <- ntile(temp$value, 4)   

Or, if you want to use dplyr syntax:

temp <- temp %>% mutate(quartile = ntile(value, 4)) 

Result in both cases is:

temp #   name       value quartile #1     a -0.56047565        1 #2     b -0.23017749        2 #3     c  1.55870831        4 #4     d  0.07050839        2 #5     e  0.12928774        3 #6     f  1.71506499        4 #7     g  0.46091621        3 #8     h -1.26506123        1 #9     i -0.68685285        1 #10    j -0.44566197        2 #11    k  1.22408180        4 #12    l  0.35981383        3 

data:

Note that you don't need to create the "quartile" column in advance and use set.seed to make the randomization reproducible:

set.seed(123) temp <- data.frame(name=letters[1:12], value=rnorm(12)) 
like image 161
talat Avatar answered Sep 21 '22 16:09

talat