I have a dataframe :
df <- data.frame(
Group=c('A','A','A','A','B','B','B','B'),
Activity = c('EOSP','NOR','EOSP','COSP','NOR','EOSP','WL','NOR'),
TimeLine=c(1,2,3,4,1,2,3,4)
)
I want to filter for only two activities for each group and in the order in which I am filtering. For example, I am only looking for the activities EOSP
and NOR
but in the order too. This code:
df %>% group_by(Group) %>%
filter(all(c('EOSP','NOR') %in% Activity) & Activity %in% c('EOSP','NOR'))
results in:
# A tibble: 6 x 3
# Groups: Group [2]
Group Activity TimeLine
<fct> <fct> <dbl>
1 A EOSP 1
2 A NOR 2
3 A EOSP 3
4 B NOR 1
5 B EOSP 2
6 B NOR 4
I don't want row 3 as EOSP
occurs after NOR
. Similarly for group B, I don't want row 4, as NOR
is occurring before EOSP
. How do I achieve this?
You can use match
to get the first instance of Activity == EOSP
and use slice
to remove everything before that. Once you do that, then you can remove duplicates and filter on EOSP
and NOR
, i.e.
library(tidyverse)
df %>%
group_by(Group) %>%
mutate(new = match('EOSP', Activity)) %>%
slice(new:n()) %>%
distinct(Activity, .keep_all = TRUE) %>%
filter(Activity %in% c('EOSP', 'NOR'))
which gives,
# A tibble: 4 x 4 # Groups: Group [2] Group Activity TimeLine new <fct> <fct> <dbl> <int> 1 A EOSP 1 1 2 A NOR 2 1 3 B EOSP 2 2 4 B NOR 4 2
NOTE 1: You can ungroup()
and select(-new)
NOTE 2: The warning messages being issued here
(Warning messages: 1: In new:4L : numerical expression has 4 elements: only the first used 2: In new:4L : numerical expression has 4 elements: only the first used )
do not affect us since we only need it to use the first element since all are the same anyway
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With