Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering on grouped variable maintaining sequence

Tags:

r

dplyr

I have a dataframe :

df <- data.frame(
        Group=c('A','A','A','A','B','B','B','B'),
        Activity = c('EOSP','NOR','EOSP','COSP','NOR','EOSP','WL','NOR'),
        TimeLine=c(1,2,3,4,1,2,3,4)
      )

I want to filter for only two activities for each group and in the order in which I am filtering. For example, I am only looking for the activities EOSP and NOR but in the order too. This code:

df %>% group_by(Group) %>% 
        filter(all(c('EOSP','NOR') %in% Activity) & Activity %in% c('EOSP','NOR'))

results in:

# A tibble: 6 x 3
# Groups:   Group [2]
  Group Activity TimeLine
  <fct> <fct>       <dbl>
1 A     EOSP            1
2 A     NOR             2
3 A     EOSP            3
4 B     NOR             1
5 B     EOSP            2
6 B     NOR             4

I don't want row 3 as EOSP occurs after NOR. Similarly for group B, I don't want row 4, as NOR is occurring before EOSP. How do I achieve this?

like image 876
Dhiraj Avatar asked Mar 05 '23 22:03

Dhiraj


1 Answers

You can use match to get the first instance of Activity == EOSP and use slice to remove everything before that. Once you do that, then you can remove duplicates and filter on EOSP and NOR, i.e.

library(tidyverse)

df %>% 
 group_by(Group) %>% 
 mutate(new = match('EOSP', Activity)) %>% 
 slice(new:n()) %>% 
 distinct(Activity, .keep_all = TRUE) %>% 
 filter(Activity %in% c('EOSP', 'NOR'))

which gives,

# A tibble: 4 x 4
# Groups:   Group [2]
  Group Activity TimeLine   new
  <fct> <fct>       <dbl> <int>
1 A     EOSP            1     1
2 A     NOR             2     1
3 B     EOSP            2     2
4 B     NOR             4     2

NOTE 1: You can ungroup() and select(-new)

NOTE 2: The warning messages being issued here

(Warning messages: 1: In new:4L : numerical expression has 4 elements: only the first used 2: In new:4L : numerical expression has 4 elements: only the first used )

do not affect us since we only need it to use the first element since all are the same anyway

like image 147
Sotos Avatar answered Mar 09 '23 14:03

Sotos