Replicating rows in data.table by column value

Question

I have a dataset that is structured as following:

data <- data.table(ID=1:10,Tenure=c(2,3,4,2,1,1,3,4,5,2),Var=rnorm(10))

    ID Tenure         Var
 1:  1      2 -0.72892371
 2:  2      3 -1.73534591
 3:  3      4  0.47007030
 4:  4      2  1.33173044
 5:  5      1 -0.07900914
 6:  6      1  0.63493316
 7:  7      3 -0.62710577
 8:  8      4 -1.69238758
 9:  9      5 -0.85709328
10: 10      2  0.10716830

I need to replicate each row N=Tenure times. e.g. I need to replicate the first row 2 times (since Tenure = 2.

I need my transformed dataset to look like the following:

setkey(data,ID)
print(data[,.(ID=rep(ID,Tenure))][data][, Indx := 1:.N, by=ID])

   ID Tenure        Var Indx
1:  1      2 -0.7289237    1
2:  1      2 -0.7289237    2
3:  2      3 -1.7353459    1
4:  2      3 -1.7353459    2
5:  2      3 -1.7353459    3
6:  3      4  0.4700703    1
...
...

Is there a more efficient way (a more data.table way) to do this? My way is pretty slow. I was thinking there should be a way to do this using a by-without-by merge usng .EACHI?

Frank · Accepted Answer

I don't think using a key/merge is helpful here. Just expand by passing a vector of row indices:

DT <- data[rep(1:.N,Tenure)][,Indx:=1:.N,by=ID]

Steven Beaupré · Answer

You could try:

library(splitstackshape)
expandRows(data, "Tenure", drop = FALSE)[,Indx:=1:.N,by=ID][]

Or

library(dplyr)
library(splitstackshape)
expandRows(data, "Tenure", drop = FALSE) %>% 
  group_by(ID) %>%
  mutate(Indx = row_number(Tenure))

Which gives:

    ID Tenure        Var Indx
 1:  1      2 -0.8808717    1
 2:  1      2 -0.8808717    2
 3:  2      3  0.5962590    1
 4:  2      3  0.5962590    2
 5:  2      3  0.5962590    3
 6:  3      4  0.1197176    1
 7:  3      4  0.1197176    2
 8:  3      4  0.1197176    3
 9:  3      4  0.1197176    4
10:  4      2 -0.2821739    1

Replicating rows in data.table by column value

Tags:

r

data.table

Mike.Gahan

Video Answer

2 Answers

Frank

Steven Beaupré

Recent Activity

Donate For Us

Replicating rows in data.table by column value

Tags:

r

data.table

Mike.Gahan

Video Answer

2 Answers

Frank

Steven Beaupré

Related questions

Recent Activity

Donate For Us