I have a dataset that is structured as following:
data <- data.table(ID=1:10,Tenure=c(2,3,4,2,1,1,3,4,5,2),Var=rnorm(10))
ID Tenure Var
1: 1 2 -0.72892371
2: 2 3 -1.73534591
3: 3 4 0.47007030
4: 4 2 1.33173044
5: 5 1 -0.07900914
6: 6 1 0.63493316
7: 7 3 -0.62710577
8: 8 4 -1.69238758
9: 9 5 -0.85709328
10: 10 2 0.10716830
I need to replicate each row N=Tenure
times. e.g. I need to replicate the first row 2 times (since Tenure = 2
.
I need my transformed dataset to look like the following:
setkey(data,ID)
print(data[,.(ID=rep(ID,Tenure))][data][, Indx := 1:.N, by=ID])
ID Tenure Var Indx
1: 1 2 -0.7289237 1
2: 1 2 -0.7289237 2
3: 2 3 -1.7353459 1
4: 2 3 -1.7353459 2
5: 2 3 -1.7353459 3
6: 3 4 0.4700703 1
...
...
Is there a more efficient way (a more data.table
way) to do this? My way is pretty slow. I was thinking there should be a way to do this using a by-without-by
merge usng .EACHI
?
I don't think using a key/merge is helpful here. Just expand by passing a vector of row indices:
DT <- data[rep(1:.N,Tenure)][,Indx:=1:.N,by=ID]
You could try:
library(splitstackshape)
expandRows(data, "Tenure", drop = FALSE)[,Indx:=1:.N,by=ID][]
Or
library(dplyr)
library(splitstackshape)
expandRows(data, "Tenure", drop = FALSE) %>%
group_by(ID) %>%
mutate(Indx = row_number(Tenure))
Which gives:
ID Tenure Var Indx
1: 1 2 -0.8808717 1
2: 1 2 -0.8808717 2
3: 2 3 0.5962590 1
4: 2 3 0.5962590 2
5: 2 3 0.5962590 3
6: 3 4 0.1197176 1
7: 3 4 0.1197176 2
8: 3 4 0.1197176 3
9: 3 4 0.1197176 4
10: 4 2 -0.2821739 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With