Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table "key indices" or "group counter"

Tags:

r

data.table

After creating a key on a data.table:

set.seed(12345) DT <- data.table(x = sample(LETTERS[1:3], 10, replace = TRUE),                  y = sample(LETTERS[1:3], 10, replace = TRUE)) setkey(DT, x, y) DT #       x y #  [1,] A B #  [2,] A B #  [3,] B B #  [4,] B B #  [5,] C A #  [6,] C A #  [7,] C A #  [8,] C A #  [9,] C C # [10,] C C 

I would like to get an integer vector giving for each row the corresponding "key index". I hope the expected output (column i) below will help clarify what I mean:

#       x y i #  [1,] A B 1 #  [2,] A B 1 #  [3,] B B 2 #  [4,] B B 2 #  [5,] C A 3 #  [6,] C A 3 #  [7,] C A 3 #  [8,] C A 3 #  [9,] C C 4 # [10,] C C 4 

I thought about using something like cumsum(!duplicated(DT[, key(DT), with = FALSE])) but am hoping there is a better solution. I feel this vector could be part of the table's internal representation, and maybe there is a way to access it? Even if it is not the case, what would you suggest?

like image 315
flodel Avatar asked Oct 22 '12 19:10

flodel


1 Answers

Update: From v1.8.3, you can simply use the inbuilt special .GRP:

DT[ , i := .GRP, by = key(DT)] 

See history for older answers.

like image 72
Matt Dowle Avatar answered Sep 30 '22 14:09

Matt Dowle