I have a dataframe :
df <- data.frame(
        Group=c('A','A','A','A','B','B','B','B'),
        Activity = c('EOSP','NOR','EOSP','COSP','NOR','EOSP','WL','NOR'),
        TimeLine=c(1,2,3,4,1,2,3,4)
      )
I want to filter for only two activities for each group and in the order in which I am filtering. For example, I am only looking for the activities EOSP and NOR but in the order too. This code:
df %>% group_by(Group) %>% 
        filter(all(c('EOSP','NOR') %in% Activity) & Activity %in% c('EOSP','NOR'))
results in:
# A tibble: 6 x 3
# Groups:   Group [2]
  Group Activity TimeLine
  <fct> <fct>       <dbl>
1 A     EOSP            1
2 A     NOR             2
3 A     EOSP            3
4 B     NOR             1
5 B     EOSP            2
6 B     NOR             4
I don't want row 3 as EOSP occurs after NOR. Similarly for group B, I don't want row 4, as NOR is occurring before EOSP. How do I achieve this?
You can use match to get the first instance of Activity == EOSP and use slice to remove everything before that. Once you do that, then you can remove duplicates and filter on EOSP and NOR, i.e.
library(tidyverse)
df %>% 
 group_by(Group) %>% 
 mutate(new = match('EOSP', Activity)) %>% 
 slice(new:n()) %>% 
 distinct(Activity, .keep_all = TRUE) %>% 
 filter(Activity %in% c('EOSP', 'NOR'))
which gives,
# A tibble: 4 x 4 # Groups: Group [2] Group Activity TimeLine new <fct> <fct> <dbl> <int> 1 A EOSP 1 1 2 A NOR 2 1 3 B EOSP 2 2 4 B NOR 4 2
NOTE 1: You can ungroup() and select(-new)
NOTE 2: The warning messages being issued here
(Warning messages: 1: In new:4L : numerical expression has 4 elements: only the first used 2: In new:4L : numerical expression has 4 elements: only the first used )
do not affect us since we only need it to use the first element since all are the same anyway
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With