Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a column with a quartile rank?

Tags:

r

I have a table called tableOne in R like this:

idNum        binaryVariable        salePrice 2               1                    55.56 4               0                    88.33 15              0                     4.45 87              1                    35.77 ...            ...                    ... 

I'd like to take the values produced from: summary(tableOne$salePrice) to create four quartiles by salePrice. I'd then like to create a column tableOne$quartile with which quartile each rows salePrice is in. It would look like:

idNum        binaryVariable            salePrice      quartile     2               1                    55.56            3     4               0                    88.33            4     15              0                     4.45            1     87              1                    35.77            2      ...            ...                    ...            ...   

Any suggestions?

like image 558
screechOwl Avatar asked Sep 22 '11 00:09

screechOwl


People also ask

How do I find column quartiles in R?

You can use the quantile() function to find quartiles in R. If your data is called “data”, then “quantile(data, prob=c(. 25,. 5,.

When to use quartile?

Quartiles often are used in sales and survey data to divide populations into groups. For example, you can use QUARTILE to find the top 25 percent of incomes in a population.

What does quartile function do?

In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equals the given probability.


2 Answers

This should do it:

tableOne <- within(tableOne, quartile <- as.integer(cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE))) 

...Some details:

The within function is great for calculating new columns. You don't have to refer to columns as tableOne$salesPrice etc.

tableOne <- within(tableOne, quartile <- <<<some expression>>>) 

The quantile function calculates the quantiles (or in your case, quartiles). 0:4/4 evaluates to c(0, 0.25, 0.50, 0.75, 1).

Finally the cut function splits your data into those quartiles. But you get a factor with weird names, so as.integer turns it into groups 1,2,3,4.

Try ?within etc to learn more about the functions mentioned here...

like image 140
Tommy Avatar answered Oct 11 '22 11:10

Tommy


A data.table approach

    library(data.table)     tableOne <- setDT(tableOne)[, quartile := cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE, labels=FALSE)] 
like image 42
usct01 Avatar answered Oct 11 '22 11:10

usct01