Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing Quantiles for a column in R to subset

Tags:

r

I have a data set with the following structure:

Name=c("a","b","c")
Amount_Spent=c(386407,213918,212006)

What I am trying to do is compute which quartile the Amount_Spent falls under for each name and assign the value to a new variable (column) Quantiles. I am not able to use any of the apply functions to get this result, can someone help please?

Thanks in advance, Raoul

like image 942
RTD Avatar asked Apr 18 '14 11:04

RTD


People also ask

How does R calculate quantiles by group?

To group data, we use dplyr module. This module contains a function called group_by() in which the column to be grouped by has to be passed. To find quantiles of the grouped data we will call summarize method with quantiles() function.

How do you find the quantile of a data set?

We often divide the distribution at 99 centiles or percentiles . The median is thus the 50th centile. For the 20th centile of FEV1, i =0.2 times 58 = 11.6, so the quantile is between the 11th and 12th observation, 3.42 and 3.48, and can be estimated by 3.42 + (3.48 - 3.42) times (11.6 - 11) = 3.46.

How do you make quartiles in R?

To calculate a quartile in R, set the percentile as parameter of the quantile function. You can use many of the other features of the quantile function which we described in our guide on how to calculate percentile in R.


2 Answers

You can do this using cut and quantile.

# some data
df <- data.frame(name=letters , am.spent = rnorm(26))

# divide df$am.spent 
df$qnt<- cut(df$am.spent , breaks=quantile(df$am.spent),
                                    labels=1:4, include.lowest=TRUE)

 # check ranges
 tapply(df$am.spent , df$qnt , range)

First get the quantile quantile(df$am.spent)

#        0%        25%        50%        75%       100% 
#-3.5888426 -0.6879445 -0.1461107  0.5835165  1.2030989 


Then use cut to divide df$am.spent at specified cutpoints - we cut at the values of the quantiles. This is specified with the breaksargument

like image 150
user20650 Avatar answered Nov 15 '22 05:11

user20650


The answer you get depends on how finely you want to cut the quantiles. Do you want quartiles (25% increments), deciles (10% increments), percentiles (1% increments)???

I have a feeling there's an easier way to do this, but here's one approach.

df           <- data.frame(Name,Amount_Spent)
q            <- quantile(df$Amount_Spent,prob=seq(0,1,0.01))  # percentiles
# function to retrieve closest quantile for a given value.
get.quantile <- function(x)names(q)[which(abs(q-x)==min(abs(q-x)))]
# apply this function for all values in df$Amount_Spent
df$Quantile  <- sapply(df$Amount_Spent,get.quantile)
df
#   Name Amount_Spent Quantile
# 1    a       386407     100%
# 2    b       213918      50%
# 3    c       212006       0%

Here is a slightly more interesting example:

set.seed(1)
df <- data.frame(Name=letters,Amount_Spent=runif(26,2e5,4e5))
q <- quantile(df$Amount_Spent,prob=seq(0,1,0.01))
df$Quantile <- sapply(df$Amount_Spent,get.quantile)
head(df)

#   Name Amount_Spent Quantile
# 1    a     253101.7      24%
# 2    b     274424.8      32%
# 3    c     314570.7      52%
# 4    d     381641.6      88%
# 5    e     240336.4      12%
# 6    f     379677.9      84%
like image 34
jlhoward Avatar answered Nov 15 '22 04:11

jlhoward