Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flag observations before and after a specific value in another column

Say I have a df:

df <- data.frame(flag = c(rep(0, 20)),
                 include = c(rep(1, 20)))
df[c(4,8,16), ]$flag <- 1
df

   flag include
1     0       1
2     0       1
3     0       1
4     1       1
5     0       1
6     0       1
7     0       1
8     1       1
9     0       1
10    0       1
11    0       1
12    0       1
13    0       1
14    0       1
15    0       1
16    1       1
17    0       1
18    0       1
19    0       1
20    0       1

What I wish to do is change the include flag to 0 if the row is within +/- two rows of a row where flag == 1. The result would look like:

   flag include
1     0       1
2     0       0
3     0       0
4     1       1
5     0       0
6     0       0
7     0       0
8     1       1
9     0       0
10    0       0
11    0       1
12    0       1
13    0       1
14    0       0
15    0       0
16    1       1
17    0       0
18    0       0
19    0       1
20    0       1

I've thought of some 'innovative' (read: inefficient and over complicated) ways to do it but was thinking there must be a simple way I'm overlooking.

Would be nice if the answer was such that I could generalize this to +/- n rows, since I have a lot more data and would be looking to potentially search within +/- 10 rows...

like image 361
Bucket Avatar asked Mar 09 '23 01:03

Bucket


1 Answers

Another option with data.table:

library(data.table)
n = 2
# find the row number where flag is one
flag_one = which(df$flag == 1)

# find the index where include needs to be updated
idx = setdiff(outer(flag_one, -n:n, "+"), flag_one)

# update include in place
setDT(df)[idx[idx >= 1 & idx <= nrow(df)], include := 0][]

# or as @Frank commented the last step with base R would be
# df$include[idx[idx >= 1 & idx <= nrow(df)]] = 0

#    flag include
# 1:    0       1
# 2:    0       0
# 3:    0       0
# 4:    1       1
# 5:    0       0
# 6:    0       0
# 7:    0       0
# 8:    1       1
# 9:    0       0
#10:    0       0
#11:    0       1
#12:    0       1
#13:    0       1
#14:    0       0
#15:    0       0
#16:    1       1
#17:    0       0
#18:    0       0
#19:    0       1
#20:    0       1

Put in a function:

update_n <- function(df, n) {
    flag_one = which(df$flag == 1)
    idx = setdiff(outer(flag_one, -n:n, "+"), flag_one)
    df$include[idx[idx >= 1 & idx <= nrow(df)]] = 0
    df
}
like image 99
Psidom Avatar answered Mar 11 '23 11:03

Psidom