When a user completes a step digitally column is_digitally_signed changes to YES.
What I am trying to do: If any step is completed digitally I want to retrieve all rows for same application_id and user_id. Please check beneath my desired output.
R code to replicate my dataset
df <- data.table(application_id = c(1,1,1,2,2,2,3,3,3),
user_id = c(123,123,123,456,456,456,789,789,789),
application_status = c("incomplete", "details_verified", "complete"),
date = c("01/01/2018", "02/01/2018", "03/01/2018"),
is_digitally_signed = c("NULL", "NULL", "YES", "NULL", "NULL", "NULL", "NULL", "YES", "NULL")) %>%
mutate(date = as.Date(date, "%d/%m/%Y"))
With an output
df
application_id user_id application_status date is_digitally_signed
1 123 incomplete 2018-01-01 NULL
1 123 details_verified 2018-01-02 NULL
1 123 complete 2018-01-03 YES
2 456 incomplete 2018-01-01 NULL
2 456 details_verified 2018-01-02 NULL
2 456 complete 2018-01-03 NULL
3 789 incomplete 2018-01-01 NULL
3 789 details_verified 2018-01-02 YES
3 789 complete 2018-01-03 NULL
My (unsuccessful) effort
df %>% group_by(application_id,user_id) %>% filter_all(all.vars(. == "YES"))
Desired outcome
application_id user_id application_status date is_digitally_signed
1 123 incomplete 2018-01-01 NULL
1 123 details_verified 2018-01-02 NULL
1 123 complete 2018-01-03 YES
3 789 incomplete 2018-01-01 NULL
3 789 details_verified 2018-01-02 YES
3 789 complete 2018-01-03 NULL
We can use filter with any, which checks for a given group whether there is at least one record with is_digitally_signed == 'YES':
library(dplyr)
df %>%
group_by(application_id, user_id) %>%
filter(any(is_digitally_signed == "YES"))
or use the all function to subset groups where not all is_digitally_signed == "NULL":
df %>%
group_by(application_id, user_id) %>%
filter(!all(is_digitally_signed == "NULL"))
We can also use data.table since you've already loaded your data as a DT:
library(data.table)
dt = setDT(df)
dt[dt[,.I[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]$V1,]
or with .SD:
dt[,.SD[any(is_digitally_signed == "YES")], by=.(application_id, user_id)]
Output:
# A tibble: 6 x 5
# Groups: application_id, user_id [2]
application_id user_id application_status date is_digitally_signed
<dbl> <dbl> <fct> <date> <fct>
1 1 123 incomplete 2018-01-01 NULL
2 1 123 details_verified 2018-01-02 NULL
3 1 123 complete 2018-01-03 YES
4 3 789 incomplete 2018-01-01 NULL
5 3 789 details_verified 2018-01-02 YES
6 3 789 complete 2018-01-03 NULL
As there is only a single column to test, we can simply use filter with any
library(dplyr)
df %>%
group_by(application_id,user_id) %>%
filter(any(is_digitally_signed == "YES"))
# A tibble: 6 x 5
# Groups: application_id, user_id [2]
# application_id user_id application_status date is_digitally_signed
# <dbl> <dbl> <chr> <date> <chr>
#1 1 123 incomplete 2018-01-01 NULL
#2 1 123 details_verified 2018-01-02 NULL
#3 1 123 complete 2018-01-03 YES
#4 3 789 incomplete 2018-01-01 NULL
#5 3 789 details_verified 2018-01-02 YES
#6 3 789 complete 2018-01-03 NULL
Or another option is using %in% to return a single TRUE/FALSE output that gets recycled
df %>%
group_by(application_id,user_id) %>%
filter("YES" %in% is_digitally_signed)
Or we can use base R
df[with(df, ave(is_digitally_signed == "YES", application_id,user_id, FUN = any)),]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With