Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter grouped data with conditional statement in R

The data frame below is grouped by id:

id<- c(1,1,1,2,3,3,4,4,4,5,5,6)
x <- c(0,1,0,0,1,1,0,0,1,0,0,1)
df <- data.frame(id, x)

I am looking for a way to filter the data in R based a condition. The condition is; if an id includes 1 in column x delete the preceding rows containing 0 for that id while maintaining the other structure of the data. The expected output is

> df
  id x
   1 1
   1 0
   2 0
   3 1
   3 1
   4 1
   5 0
   5 0
   6 1

I tried to subset the data using the filter function in the dplyr package as in the code below:

df <- df %>%
   group_by(id) %>%
   filter(first(x)==1 | x == 0)

but I am not getting the expected and I am reaching out for help. I greatly appreciate any help.

like image 896
T Richard Avatar asked Oct 24 '25 03:10

T Richard


2 Answers

A rather awkward-looking base R way may be:

do.call(rbind, by(df, id, FUN=\(x) subset(x, cumsum(x)>0 | all(x==0, na.rm=T))))
     id x
1.2   1 1
1.3   1 0
2     2 0
3.5   3 1
3.6   3 1
4     4 1
5.10  5 0
5.11  5 0
6     6 1

The rownames look ugly, but they do provide useful information. However, the answer given by lroha using dplyr::filter in the comments looks like the preferred method.

like image 88
Edward Avatar answered Oct 25 '25 19:10

Edward


Using which.max()

stopifnot(all(df$x %in% c(0, 1))) # Assumes 0/1 only.

df |> filter(row_number() >= which.max(x), .by = id)
# or
df |> slice(which.max(x):n(), .by = id)

#   id x
# 1  1 1
# 2  1 0
# 3  2 0
# 4  3 1
# 5  3 1
# 6  4 1
# 7  5 0
# 8  5 0
# 9  6 1

Why it works?

Two cases to consider:

  • If there is a 1 in x then which.max(x) is the index of the first instance and we remove everything before it.
  • Otherwise (all 0) which.max(x) == 1 so we don't remove anything.
like image 29
sindri_baldur Avatar answered Oct 25 '25 17:10

sindri_baldur



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!