I have a table called tableOne in R like this:
idNum binaryVariable salePrice 2 1 55.56 4 0 88.33 15 0 4.45 87 1 35.77 ... ... ...
I'd like to take the values produced from: summary(tableOne$salePrice) to create four quartiles by salePrice. I'd then like to create a column tableOne$quartile with which quartile each rows salePrice is in. It would look like:
idNum binaryVariable salePrice quartile 2 1 55.56 3 4 0 88.33 4 15 0 4.45 1 87 1 35.77 2 ... ... ... ...
Any suggestions?
You can use the quantile() function to find quartiles in R. If your data is called “data”, then “quantile(data, prob=c(. 25,. 5,.
Quartiles often are used in sales and survey data to divide populations into groups. For example, you can use QUARTILE to find the top 25 percent of incomes in a population.
In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equals the given probability.
This should do it:
tableOne <- within(tableOne, quartile <- as.integer(cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE)))
...Some details:
The within
function is great for calculating new columns. You don't have to refer to columns as tableOne$salesPrice
etc.
tableOne <- within(tableOne, quartile <- <<<some expression>>>)
The quantile
function calculates the quantiles (or in your case, quartiles). 0:4/4
evaluates to c(0, 0.25, 0.50, 0.75, 1)
.
Finally the cut
function splits your data into those quartiles. But you get a factor
with weird names, so as.integer
turns it into groups 1,2,3,4
.
Try ?within
etc to learn more about the functions mentioned here...
A data.table approach
library(data.table) tableOne <- setDT(tableOne)[, quartile := cut(salesPrice, quantile(salesPrice, probs=0:4/4), include.lowest=TRUE, labels=FALSE)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With