I have a data frame like this one:
> df
id type
1 1 a
2 1 a
3 1 b
4 1 a
5 1 b
6 2 a
7 2 a
8 2 b
9 3 a
10 3 a
I want to keep all rows for each group (id) up to the first occurrence of value 'b' in the type column. For groups without type 'b', I want to keep all their rows.
The resulting data frame should look like this:
> dfnew
id type
1 1 a
2 1 a
3 1 b
4 2 a
5 2 a
6 2 b
7 3 a
8 3 a
I tried the following code, but it retains additional rows that have the value 'a' beyond the first occurrence of 'b', and only excludes additional occurrences of 'b', which is not what I want. Look at row 4 in the following. I want to rid of it.
> df %>% group_by(id) %>% filter(cumsum(type == 'b') <= 1)
Source: local data frame [7 x 2]
Groups: id
id type
1 1 a
2 1 a
3 1 b
4 1 a
5 2 a
6 2 a
7 2 b
8 3 a
9 3 a
You could combine match
or which
with slice
or (as mentioned by @Richard) which.max
library(dplyr)
df %>%
group_by(id) %>%
slice(if(any(type == "b")) 1:which.max(type == "b") else row_number())
# Source: local data table [8 x 2]
# Groups: id
#
# id type
# 1 1 a
# 2 1 a
# 3 1 b
# 4 2 a
# 5 2 a
# 6 2 b
# 7 3 a
# 8 3 a
Or you could try it with data.table
library(data.table)
setDT(df)[, if(any(type == "b")) .SD[1:which.max(type == "b")] else .SD, by = id]
# id type
# 1: 1 a
# 2: 1 a
# 3: 1 b
# 4: 2 a
# 5: 2 a
# 6: 2 b
# 7: 3 a
# 8: 3 a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With