I would like to do quantile cuts (cut into n bins with equal number of points) for each group
qcut = function(x, n) {
quantiles = seq(0, 1, length.out = n+1)
cutpoints = unname(quantile(x, quantiles, na.rm = TRUE))
cut(x, cutpoints, include.lowest = TRUE)
}
library(data.table)
dt = data.table(A = 1:10, B = c(1,1,1,1,1,2,2,2,2,2))
dt[, bin := qcut(A, 3)]
dt[, bin2 := qcut(A, 3), by = B]
dt
A B bin bin2
1: 1 1 [1,4] [6,7.33]
2: 2 1 [1,4] [6,7.33]
3: 3 1 [1,4] (7.33,8.67]
4: 4 1 [1,4] (8.67,10]
5: 5 1 (4,7] (8.67,10]
6: 6 2 (4,7] [6,7.33]
7: 7 2 (4,7] [6,7.33]
8: 8 2 (7,10] (7.33,8.67]
9: 9 2 (7,10] (8.67,10]
10: 10 2 (7,10] (8.67,10]
Here the cut without grouping is correct -- data lie in the bin. But the result by group is wrong.
How can I fix that?
In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created.
Alternative methods of calculating quantiles p=0.25 for the lower quartile), then you can do the following: (i) sort the original data in increasing order, (ii) find the (p*(n + 1))th number along. Then, since n = 23 as in our example, (p*(n + 1)) = (0.25*24) = 6.
A quantile defines a particular part of a data set, i.e. a quantile determines how many values in a distribution are above or below a certain limit. Special quantiles are the quartile (quarter), the quintile (fifth) and percentiles (hundredth).
To place each data value into a decile, we can use the ntile(x, ngroups) function from the dplyr package in R. What is this? The way to interpret the output is as follows: The data value 56 falls between the percentile 0% and 10%, thus it falls in the first decile.
This is a bug in handling of factors. Please check if it is known (or fixed in the development version) and report it to the data.table bug tracker otherwise.
qcut = function(x, n) {
quantiles = seq(0, 1, length.out = n+1)
cutpoints = unname(quantile(x, quantiles, na.rm = TRUE))
as.character(cut(x, cutpoints, include.lowest = TRUE))
}
dt[, bin2 := qcut(A, 3), by = B]
# A B bin bin2
# 1: 1 1 [1,4] [1,2.33]
# 2: 2 1 [1,4] [1,2.33]
# 3: 3 1 [1,4] (2.33,3.67]
# 4: 4 1 [1,4] (3.67,5]
# 5: 5 1 (4,7] (3.67,5]
# 6: 6 2 (4,7] [6,7.33]
# 7: 7 2 (4,7] [6,7.33]
# 8: 8 2 (7,10] (7.33,8.67]
# 9: 9 2 (7,10] (8.67,10]
#10: 10 2 (7,10] (8.67,10]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With