Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data.table: Select n specific rows before & after other rows meeting a condition

Tags:

r

data.table

Given the following example data table:

library(data.table)
DT <- fread("grp y exclude
a 1 0
a 2 0
a 3 0
a 4 1
a 5 0
a 7 1
a 8 0
a 9 0
a 10 0
b 1 0
b 2 0
b 3 0
b 4 1
b 5 0
b 6 1
b 7 1
b 8 0
b 9 0
b 10 0
c 5 1
d 1 0")

I want to select

  1. by group grp
  2. all rows that have y==5
  3. and up to two rows before and after each row from 2 within the grouping.
  4. but 3. only those rows that have exclude==0.

Assuming each group has max one row with y==5, this would yield the desired result for 1.-3.:

idx <- -2:2 # 2 rows before match, the matching row itself, and two rows after match
(row_numbers <- DT[,.I[{
                         x <- rep(which(y==5),each=length(idx))+idx 
                         x[x>0 & x<=.N]
                       }], by=grp]$V1)
# [1]  3  4  5  6  7 12 13 14 15 16 20
DT[row_numbers]
#     grp y exclude
#  1:   a 3       0
#  2:   a 4       1
#  3:   a 5       0 # y==5 + two rows before and two rows after
#  4:   a 7       1
#  5:   a 8       0
#  6:   b 3       0
#  7:   b 4       1
#  8:   b 5       0 # y==5 + two rows before and two rows after
#  9:   b 6       1
# 10:   b 7       1
# 11:   c 5       1 # y==5 + nothing, because the group has only 1 element

However, how do I incorporate 4. so that I get

#     grp  y exclude
#  1:   a  2       0
#  2:   a  3       0
#  3:   a  5       0
#  4:   a  8       0
#  5:   a  9       0
#  6:   b  2       0
#  7:   b  3       0
#  8:   b  5       0
#  9:   b  8       0
# 10:   b  9       0
# 11:   c  5       1

? Feels like I'm close, but I guess I looked too long at heads and whiches, now, so I'd be thankful for some fresh ideas.

like image 516
lukeA Avatar asked Mar 15 '17 22:03

lukeA


2 Answers

A bit more simplified:

DT[DT[, rn := .I][exclude==0 | y==5][, rn[abs(.I - .I[y==5]) <= 2], by=grp]$V1]

 #   grp y exclude rn
 #1:   a 2       0  2
 #2:   a 3       0  3
 #3:   a 5       0  5
 #4:   a 8       0  7
 #5:   a 9       0  8
 #6:   b 2       0 11
 #7:   b 3       0 12
 #8:   b 5       0 14
 #9:   b 8       0 17
#10:   b 9       0 18
#11:   c 5       1 20
like image 134
thelatemail Avatar answered Sep 24 '22 23:09

thelatemail


You are very close. This should do it:

row_numbers <- DT[exclude==0 | y==5, .I[{
    x <- rep(which(y==5), each=length(idx)) + idx 
    x[x>0 & x<=.N]
  }], by=grp]$V1
DT[row_numbers]
like image 23
dww Avatar answered Sep 23 '22 23:09

dww