I have a data set with the following structure: <pre class="prettyprint"><code>Name=c("a","b","c") Amount_Spent=c(386407,213918,212006) </code></pre> What I am trying to do is compute which quartile the <code>Amount_Spent</code> falls under for each name and assign the value to a new variable (column) <code>Quantiles</code>. I am not able to use any of the apply functions to get this result, can someone help please? Thanks in advance, Raoul

You can do this using <code>cut</code> and <code>quantile</code>. <pre class="prettyprint"><code># some data df <- data.frame(name=letters , am.spent = rnorm(26)) # divide df$am.spent df$qnt<- cut(df$am.spent , breaks=quantile(df$am.spent), labels=1:4, include.lowest=TRUE) # check ranges tapply(df$am.spent , df$qnt , range) </code></pre> <hr> First get the <code>quantile</code> quantile(df$am.spent) <pre class="prettyprint"><code># 0% 25% 50% 75% 100% #-3.5888426 -0.6879445 -0.1461107 0.5835165 1.2030989 </code></pre> Then use <code>cut</code> to divide df$am.spent at specified cutpoints - we cut at the values of the quantiles. This is specified with the <code>breaks</code>argument

Computing Quantiles for a column in R to subset

Tags:

r

I have a data set with the following structure:

Name=c("a","b","c")
Amount_Spent=c(386407,213918,212006)

What I am trying to do is compute which quartile the Amount_Spent falls under for each name and assign the value to a new variable (column) Quantiles. I am not able to use any of the apply functions to get this result, can someone help please?

Thanks in advance, Raoul

942

asked Apr 18 '14 11:04

RTD

2 Answers

You can do this using cut and quantile.

# some data
df <- data.frame(name=letters , am.spent = rnorm(26))

# divide df$am.spent 
df$qnt<- cut(df$am.spent , breaks=quantile(df$am.spent),
                                    labels=1:4, include.lowest=TRUE)

 # check ranges
 tapply(df$am.spent , df$qnt , range)

First get the quantile quantile(df$am.spent)

#        0%        25%        50%        75%       100% 
#-3.5888426 -0.6879445 -0.1461107  0.5835165  1.2030989

Then use cut to divide df$am.spent at specified cutpoints - we cut at the values of the quantiles. This is specified with the breaksargument

150

answered Nov 15 '22 05:11

user20650

The answer you get depends on how finely you want to cut the quantiles. Do you want quartiles (25% increments), deciles (10% increments), percentiles (1% increments)???

I have a feeling there's an easier way to do this, but here's one approach.

df           <- data.frame(Name,Amount_Spent)
q            <- quantile(df$Amount_Spent,prob=seq(0,1,0.01))  # percentiles
# function to retrieve closest quantile for a given value.
get.quantile <- function(x)names(q)[which(abs(q-x)==min(abs(q-x)))]
# apply this function for all values in df$Amount_Spent
df$Quantile  <- sapply(df$Amount_Spent,get.quantile)
df
#   Name Amount_Spent Quantile
# 1    a       386407     100%
# 2    b       213918      50%
# 3    c       212006       0%

Here is a slightly more interesting example:

set.seed(1)
df <- data.frame(Name=letters,Amount_Spent=runif(26,2e5,4e5))
q <- quantile(df$Amount_Spent,prob=seq(0,1,0.01))
df$Quantile <- sapply(df$Amount_Spent,get.quantile)
head(df)

#   Name Amount_Spent Quantile
# 1    a     253101.7      24%
# 2    b     274424.8      32%
# 3    c     314570.7      52%
# 4    d     381641.6      88%
# 5    e     240336.4      12%
# 6    f     379677.9      84%

answered Nov 15 '22 04:11

jlhoward

Related questions
                            
                                How to get the intercept from a linear model with lasso (lars R package)
                            
                                R packages - should I import the `methods` package?
                            
                                In an R dataframe, how do I broadcast columns corresponding to dimensions?
                            
                                how to avoid overlapping labels with identical data points in scatterplot / ggplot?
                            
                                What Exactly are Anonymous Files
                            
                                AIC different between biglm and lm
                            
                                How to properly set contrasts in R
                            
                                How to read specific rows of CSV file with fread function
                            
                                Stop table disappearing off end of page with stargazer
                            
                                Download file from internet via R despite the popup
                            
                                R shiny: How to build dynamic UI (text input)
                            
                                Insert column at beginning of a data frame [duplicate]
                            
                                Applying as.Date to Excel format dates in R
                            
                                Assignment to subset of a matrix with repeated indices
                            
                                For each element in a matrix, find the sum of all of its neighbors
                            
                                data.table joins - Select all columns in the i argument
                            
                                Order or filtering vs processing
                            
                                Programing Logistic regression with Stochastic gradient descent in R
                            
                                Inconsistent vector operations in R?
                            
                                Finding "local maximas" but ignore value less than 20% of highest one

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With