Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mutate a column based on values occurring in a particular sequence?

I have a data frame df.

  df <- data.frame(ID = c(1,1,1,2,2,2,3,3,3,4,4,4,4),  process = c("inspection", "evaluation", "result","inspection", "result", "evaluation", "result", "inspection","result","evaluation","result","result","evaluation"))

I need to insert a column true_process such that if evaluation comes before the result for a particular ID, then it is true. If it comes after or if it is missing, it should take the value false.

The code that I have tried.

library(dplyr)
df %>% 
    group_by(ID) %>% 
    mutate(true_process = case_when(
        !any(process == "evaluation") ~ "False",
        length(process == "evaluation")[[1]] > length(process == "result")[[1]] ~ "False",
        TRUE ~ "True"
    )) 
# A tibble: 13 x 3
# Groups:   ID [4]
      ID process    true_process
   <dbl> <fct>      <chr>       
 1     1 inspection True        
 2     1 evaluation True        
 3     1 result     True        
 4     2 inspection True        
 5     2 result     True        
 6     2 evaluation True        
 7     3 result     False       
 8     3 inspection False       
 9     3 result     False       
10     4 evaluation True        
11     4 result     True        
12     4 result     True        
13     4 evaluation True 

The expected output is as follows

# A tibble: 13 x 3
# Groups:   ID [4]
      ID process    true_process
   <dbl> <fct>      <lgl>       
 1     1 inspection TRUE        
 2     1 evaluation TRUE        
 3     1 result     TRUE        
 4     2 inspection FALSE       
 5     2 result     FALSE       
 6     2 evaluation FALSE       
 7     3 result     FALSE       
 8     3 inspection FALSE       
 9     3 result     FALSE       
10     4 evaluation FALSE       
11     4 result     FALSE       
12     4 result     FALSE       
13     4 evaluation FALSE    
like image 516
Silent_bliss Avatar asked Oct 16 '22 01:10

Silent_bliss


1 Answers

Based on your updated data, you can check if the index of the last instance of evaluation is less than any of the indices of result.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(true_process = any(tail(which(process == "evaluation"), 1) < which(process == "result")))


# A tibble: 13 x 3
# Groups:   ID [4]
      ID process    true_process
   <dbl> <chr>      <lgl>       
 1     1 inspection TRUE        
 2     1 evaluation TRUE        
 3     1 result     TRUE        
 4     2 inspection FALSE       
 5     2 result     FALSE       
 6     2 evaluation FALSE       
 7     3 result     FALSE       
 8     3 inspection FALSE       
 9     3 result     FALSE       
10     4 evaluation FALSE       
11     4 result     FALSE       
12     4 result     FALSE       
13     4 evaluation FALSE
like image 148
Ritchie Sacramento Avatar answered Oct 19 '22 00:10

Ritchie Sacramento