Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

new column value in data.table for ID using quantile bins

Tags:

r

data.table

quantile(X, prob = seq(0, 1, length = 5), type = 5)

How would you transfer this into a data.table operation to add a new column using := and assign a value to each ID where if the value falls within the bins to assign an appropriate ordered value like 25%=1, 50%=2 etc for each ID?

like image 697
digdeep Avatar asked Dec 30 '25 01:12

digdeep


1 Answers

You could use findInterval. This will allow you to use quantile, and the various definitions thereof.

eg

findInterval(x, quantile(x,type=5), rightmost.closed=TRUE)

# It is fast
set.seed(1)
DT <- data.table(x=rnorm(1e6))

library(microbenchmark)


microbenchmark(
  order = DT[order(x),bin:=ceiling(.I/.N*5)],
  findInterval = DT[, b2 :=findInterval(x, quantile(x,type=5), rightmost.closed=TRUE)],times=10 )
## Unit: milliseconds
##         expr       min        lq    median       uq      max neval
##        order 551.31154 568.20324 573.36605 640.3255 655.5024    10
## findInterval  70.16782  79.11459  80.36363 140.2807 147.3080    10
like image 200
mnel Avatar answered Jan 01 '26 14:01

mnel