How to compare with values adjacent in a sequence in the same group

Question

Let's say I have something like this:

set.seed(0)
the.df <- data.frame( x=rep(letters[1:3], each=4),
                        n=rep(0:3, 3),
                        val=round(runif(12)))
the.df


   x n val
1  a 0   1
2  a 1   0
3  a 2   0
4  a 3   1
5  b 0   1
6  b 1   0
7  b 2   1
8  b 3   1
9  c 0   1
10 c 1   1
11 c 2   0
12 c 3   0

Within each x, starting from n==2 (going from small to large), I want to set val to 0 if the previous val (in terms of n) is 0; otherwise, leave it as is.

For example, in the subset x=="b", I first ignore the two rows where n < 2. Now, in Row 7, because the previous val is 0 (the.df$val[the.df$x=="b" & the.df$n==1]), I set val to 0 (the.df$val[the.df$x=="b" & the.df$n==2] <- 0). Then on Row 8, now that val for the previous n is 0 (we just set it), I also want to set val here to 0 (the.df$val[the.df$x=="b" & the.df$n==3] <- 0).

Imagine that the data.frame is not sorted. Therefore procedures that depend on the order would require a sort. I also can't assume that adjacent rows exist (e.g., the row the.df[the.df$x=="a" & the.df$n==1, ] might be missing).

The trickiest part seems to be evaluating val in sequence. I can do this using a loop but I imagine that it would be inefficient (I have millions of rows). Is there a way I can do this more efficiently?

EDIT: wanted output

the.df

   x n val wanted
1  a 0   1      1
2  a 1   0      0
3  a 2   0      0
4  a 3   1      0
5  b 0   1      1
6  b 1   0      0
7  b 2   1      0
8  b 3   1      0
9  c 0   1      1
10 c 1   1      1
11 c 2   0      0
12 c 3   0      0

Also, I don't mind making new columns (e.g., putting the wanted values there).

David Arenburg · Accepted Answer

Using data.table I would try the following

library(data.table)
setDT(the.df)[order(n), 
          val := if(length(indx <- which(val[2:.N] == 0L))) 
            c(val[1:(indx[1L] + 1L)], rep(0L, .N - (indx[1L] + 1L))), 
          by = x]
the.df
#     x n val
#  1: a 0   1
#  2: a 1   0
#  3: a 2   0
#  4: a 3   0
#  5: b 0   1
#  6: b 1   0
#  7: b 2   0
#  8: b 3   0
#  9: c 0   1
# 10: c 1   1
# 11: c 2   0
# 12: c 3   0

This will simultaneously order the data by n (as you said it's not ordered in real life) and recreate val by condition (meaning that if condition not satisfied, val will be untouched).

Hopefully in the near future this will be implemented and then the code could potentially be

setDT(the.df)[order(n), val[n > 2] := if(val[2L] == 0) 0L, by = x]

Which could be a great improvement both performance and syntax wise

How to compare with values adjacent in a sequence in the same group

Tags:

r

ceiling cat

1 Answers

David Arenburg

Recent Activity

Donate For Us

How to compare with values adjacent in a sequence in the same group

Tags:

r

ceiling cat

1 Answers

David Arenburg

Related questions

Recent Activity

Donate For Us