Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete rows after a certain sequence of values in a certain column

Tags:

r

rows

a <- c("A","A","A","B","B","B","C","C","C","C","D","D","D","D","D")
b <- c("x","y","z","x","x","z","y","z","z","z","y","z","z","z","x")
df = data.frame(a,b)


    a   b
1   A   x
2   A   y
3   A   z
4   B   x
5   B   x
6   B   z
7   C   y
8   C   z
9   C   z
10  C   z
11  D   y
12  D   z
13  D   z
14  D   z
15  D   x

For every group A, B, C, D, I'd like to delete the value z in column b every time the combination y,z appears at the end of the group.

If we have the case of a=="C", where the b-values are y,z,z,z, I'd like to delete all z's. However, in a=="D", nothing has to change as x is the last value.

The results looks like this:

    a   b
1   A   x
2   A   y
4   B   x
5   B   x
6   B   z
7   C   y
11  D   y
12  D   z
13  D   z
14  D   z
15  D   x

By grouping in dplyr, I can identify the last occurence of each value in A, so the basic case depictured in a=="A"is not a problem. I have trouble finding a solution for the case of a=="C", where I could have one occurence of y followed by 20 occurences of z.

like image 393
rmuc8 Avatar asked Jan 08 '23 18:01

rmuc8


1 Answers

You can use by and cummin in base R:

df[unlist(by(df$b, interaction(df$a), FUN = function(x) {
  tmp <- rev(cummin(rev(x == "z")))
  if (tail(x[!tmp], 1) == "y") !tmp else rep(TRUE, length(x))
})), ]

The result:

   a b
1  A x
2  A y
4  B x
5  B x
6  B z
7  C y
11 D y
12 D z
13 D z
14 D z
15 D x
like image 89
Sven Hohenstein Avatar answered Jan 11 '23 19:01

Sven Hohenstein