Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replicate a dataset with a new ID column in R

Tags:

r

I want to replicate a dataset that contains different "ID" several time and create a new column for a new unique ID. The original dataset looks as follows:

zz <- "ID  TIME
A   0.06
A   0.26
A   0.31
A   0.47
B   0.17
B   0.37
B   0.48
B   0.55
C   0.28
C   0.40
C   0.62
C   0.76"
df <- read.table(text = zz, header = TRUE)

The intended output after replicating the data 3 time and adding a new unique ID column (New.ID) should look as follows:

ID  TIME New.ID
A   0.06    1
A   0.26    1
A   0.31    1
A   0.47    1
B   0.17    2
B   0.37    2
B   0.48    2
B   0.55    2
C   0.28    3
C   0.40    3
C   0.62    3
C   0.76    3
A   0.06    4
A   0.26    4
A   0.31    4
A   0.47    4
B   0.17    5
B   0.37    5
B   0.48    5
B   0.55    5
C   0.28    6
C   0.40    6
C   0.62    6
C   0.76    6
A   0.06    7
A   0.26    7
A   0.31    7
A   0.47    7
B   0.17    8
B   0.37    8
B   0.48    8
B   0.55    8
C   0.28    9
C   0.40    9
C   0.62    9
C   0.76    9
like image 499
Amir Youssef Avatar asked Oct 26 '25 10:10

Amir Youssef


1 Answers

An option is rep to replicate the rows by 3 and then with rleid create a new ID column by incrementing the index when the adjacent elements are not the same in 'ID'

library(data.table)
setDT(df1)[rep(seq_len(.N), 3)][, New.ID := rleid(ID)][]
#    ID TIME New.ID
# 1:  A 0.06      1
# 2:  A 0.26      1
# 3:  A 0.31      1
# 4:  A 0.47      1
# 5:  B 0.17      2
# 6:  B 0.37      2
# 7:  B 0.48      2
# 8:  B 0.55      2
# 9:  C 0.28      3
#10:  C 0.40      3
#11:  C 0.62      3
#12:  C 0.76      3
#13:  A 0.06      4
#14:  A 0.26      4
#15:  A 0.31      4
#16:  A 0.47      4
#17:  B 0.17      5
#18:  B 0.37      5
#19:  B 0.48      5
#20:  B 0.55      5
#21:  C 0.28      6
#22:  C 0.40      6
#23:  C 0.62      6
#24:  C 0.76      6
#25:  A 0.06      7
#26:  A 0.26      7
#27:  A 0.31      7
#28:  A 0.47      7
#29:  B 0.17      8
#30:  B 0.37      8
#31:  B 0.48      8
#32:  B 0.55      8
#33:  C 0.28      9
#34:  C 0.40      9
#35:  C 0.62      9
#36:  C 0.76      9

Or using dplyr

library(dplyr)
df1 %>% 
     slice(rep(row_number(), 3)) %>%
     mutate(New.ID = rleid(ID))

data

df1 <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C"), TIME = c(0.06, 0.26, 0.31, 0.47, 0.17, 0.37, 
0.48, 0.55, 0.28, 0.4, 0.62, 0.76)), class = "data.frame", row.names = c(NA, 
-12L))
like image 147
akrun Avatar answered Oct 29 '25 00:10

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!