I have the following data.table.
ts,id
1,a
2,a
3,a
4,a
5,a
6,a
7,a
1,b
2,b
3,b
4,b
I want to subset this data.table into two. The criteria is to have approximately the first half for each group (in this case column "id") in one data table and the remaining in another data.table. So the expected result are two data.tables as follows
ts,id
1,a
2,a
3,a
4,a
1,b
2,b
and
ts,id
5,a
6,a
7,a
3,b
4,b
I tried the following,
z1 = x[,.SD[.I < .N/2,],by=dev]
z1
and got just the following
id ts
a 1
a 2
a 3
Somehow, .I within the .SD isn't working the way I think it should. Any help appreciated. Thanks in advance.
.I
gives the row locations with respect to the whole data.table. Thus it can't be used like that within .SD
.
Something like
DT[, subset := seq_len(.N) > .N/2,by='id']
subset1 <- DT[(subset)][,subset:=NULL]
subset2 <- DT[!(subset)][,subset:=NULL]
subset1
# ts id
# 1: 4 a
# 2: 5 a
# 3: 6 a
# 4: 7 a
# 5: 3 b
# 6: 4 b
subset2
# ts id
# 1: 1 a
# 2: 2 a
# 3: 3 a
# 4: 1 b
# 5: 2 b
Should work
For more than 2 groups, you could use cut
to create a factor with the appropriate number of levels
Something like
DT[, subset := cut(seq_len(.N), 3, labels= FALSE),by='id']
# you could copy to the global environment a subset for each, but this
# will not be memory efficient!
list2env(setattr(split(DT, DT[['subset']]),'names', paste0('s',1:3)), .GlobalEnv)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With