I have a data set with the following structure:
Name=c("a","b","c")
Amount_Spent=c(386407,213918,212006)
What I am trying to do is compute which quartile the Amount_Spent
falls under for each name and assign the value to a new variable (column) Quantiles
. I am not able to use any of the apply functions to get this result, can someone help please?
Thanks in advance, Raoul
To group data, we use dplyr module. This module contains a function called group_by() in which the column to be grouped by has to be passed. To find quantiles of the grouped data we will call summarize method with quantiles() function.
We often divide the distribution at 99 centiles or percentiles . The median is thus the 50th centile. For the 20th centile of FEV1, i =0.2 times 58 = 11.6, so the quantile is between the 11th and 12th observation, 3.42 and 3.48, and can be estimated by 3.42 + (3.48 - 3.42) times (11.6 - 11) = 3.46.
To calculate a quartile in R, set the percentile as parameter of the quantile function. You can use many of the other features of the quantile function which we described in our guide on how to calculate percentile in R.
You can do this using cut
and quantile
.
# some data
df <- data.frame(name=letters , am.spent = rnorm(26))
# divide df$am.spent
df$qnt<- cut(df$am.spent , breaks=quantile(df$am.spent),
labels=1:4, include.lowest=TRUE)
# check ranges
tapply(df$am.spent , df$qnt , range)
First get the quantile
quantile(df$am.spent)
# 0% 25% 50% 75% 100%
#-3.5888426 -0.6879445 -0.1461107 0.5835165 1.2030989
Then use cut
to divide df$am.spent at specified cutpoints - we cut at the values of the quantiles. This is specified with the breaks
argument
The answer you get depends on how finely you want to cut the quantiles. Do you want quartiles (25% increments), deciles (10% increments), percentiles (1% increments)???
I have a feeling there's an easier way to do this, but here's one approach.
df <- data.frame(Name,Amount_Spent)
q <- quantile(df$Amount_Spent,prob=seq(0,1,0.01)) # percentiles
# function to retrieve closest quantile for a given value.
get.quantile <- function(x)names(q)[which(abs(q-x)==min(abs(q-x)))]
# apply this function for all values in df$Amount_Spent
df$Quantile <- sapply(df$Amount_Spent,get.quantile)
df
# Name Amount_Spent Quantile
# 1 a 386407 100%
# 2 b 213918 50%
# 3 c 212006 0%
Here is a slightly more interesting example:
set.seed(1)
df <- data.frame(Name=letters,Amount_Spent=runif(26,2e5,4e5))
q <- quantile(df$Amount_Spent,prob=seq(0,1,0.01))
df$Quantile <- sapply(df$Amount_Spent,get.quantile)
head(df)
# Name Amount_Spent Quantile
# 1 a 253101.7 24%
# 2 b 274424.8 32%
# 3 c 314570.7 52%
# 4 d 381641.6 88%
# 5 e 240336.4 12%
# 6 f 379677.9 84%
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With