Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lookup table based on multiple conditions in R

Thank you for taking a look at my question!

I have the following (dummy) data for patient performance on 3 tasks:

patient_df = data.frame(id = seq(1:5),
                        age = c(30,72,46,63,58),
                        education = c(11, 22, 18, 12, 14),
                        task1 = c(21, 28, 20, 24, 22),
                        task2 = c(15, 15, 10, 11, 14), 
                        task3 = c(82, 60, 74, 78, 78))
> patient_df
  id age education task1 task2 task3
1  1  30        11    21    15    82
2  2  72        22    28    15    60
3  3  46        18    20    10    74
4  4  63        12    24    11    78
5  5  58        14    22    14    78

I also have the following (dummy) lookup table for age and education-based cutoff values to define a patient's performance as impaired or not impaired on each task:

cutoffs = data.frame(age = rep(seq(from = 35, to = 70, by = 5), 2),
                     education = c(rep("<16", 8), rep(">=16",8)),
                     task1_cutoff = c(rep(24, 16)),
                     task2_cutoff = c(11,11,11,11,10,10,10,10,9,13,13,13,13,12,12,11),
                     task3_cutoff = c(rep(71,8), 70, rep(74,2), rep(73, 5)))
> cutoffs
   age education task1_cutoff task2_cutoff task3_cutoff
1   35       <16           24           11           71
2   40       <16           24           11           71
3   45       <16           24           11           71
4   50       <16           24           11           71
5   55       <16           24           10           71
6   60       <16           24           10           71
7   65       <16           24           10           71
8   70       <16           24           10           71
9   35      >=16           24            9           70
10  40      >=16           24           13           74
11  45      >=16           24           13           74
12  50      >=16           24           13           73
13  55      >=16           24           13           73
14  60      >=16           24           12           73
15  65      >=16           24           12           73
16  70      >=16           24           11           73

My goal is to create 3 new variables in patient_df that indicate whether or not a patient is impaired on each task with a binary indicator. For example, for id=1 in patient_df, their age is <=35 and their education is <16 years, so the cutoff value for task1 would be 24, the cutoff value for task2 would be 11, and the cutoff value for task3 would be 71, such that scores below these values would denote impairment.

I would like to do this for each id by referencing the age and education-associated cutoff value in the cutoff dataset, so that the outcome would look something like this:

> goal_patient_df
  id age education task1 task2 task3 task1_impaired task2_impaired task3_impaired
1  1  30        11     21     15     82               1               1               0
2  2  72        22     28     15     60               0               0               1
3  3  46        18     20     10     74               1               1               0
4  4  63        12     24     11     78               1               0               0
5  5  58        14     22     14     78               1               0               0

In actuality, my patient_df has 600+ patients and there are 7+ tasks each with age- and education-associated cutoff values, so a 'clean' way of doing this would be greatly appreciated! My only alternative that I can think of right now is writing a TON of if_else statements or case_whens which would not be incredibly reproducible for anyone else who would use my code :(

Thank you in advance!

like image 707
zoey107 Avatar asked Oct 15 '22 03:10

zoey107


1 Answers

I would recommend putting both your lookup table and patient_df dataframe in long form. I think that might be easier to manage with multiple tasks.

Your education column is numeric; so converting to character "<16" or ">=16" will help with matching in lookup table.

Using fuzzy_inner_join will match data with lookup table where task and education match exactly == but age will between an age_low and age_high if you specify a range of ages for each lookup table row.

Finally, impaired is calculated comparing the values from the two data frames for the particular task.

Please note for output, id of 1 is missing, as falls outside of age range from lookup table. You can add more rows to that table to address this.

library(tidyverse)
library(fuzzyjoin)

cutoffs_long <- cutoffs %>%
  pivot_longer(cols = starts_with("task"), names_to = "task", values_to = "cutoff_value", names_pattern = "task(\\d+)") %>%
  mutate(age_low = age, 
         age_high = age + 4) %>%
  select(-age)

patient_df %>%
  pivot_longer(cols = starts_with("task"), names_to = "task", values_to = "patient_value", names_pattern = "(\\d+)") %>%
  mutate(education = ifelse(education < 16, "<16", ">=16")) %>%
  fuzzy_inner_join(cutoffs_long, by = c("age" = "age_low", "age" = "age_high", "education", "task"), match_fun = list(`>=`, `<=`, `==`, `==`)) %>%
  mutate(impaired = +(patient_value < cutoff_value))

Output

# A tibble: 12 x 11
      id   age education.x task.x patient_value education.y task.y cutoff_value age_low age_high impaired
   <int> <dbl> <chr>       <chr>          <dbl> <chr>       <chr>         <dbl>   <dbl>    <dbl>    <int>
 1     2    72 >=16        1                 28 >=16        1                24      70       74        0
 2     2    72 >=16        2                 15 >=16        2                11      70       74        0
 3     2    72 >=16        3                 60 >=16        3                73      70       74        1
 4     3    46 >=16        1                 20 >=16        1                24      45       49        1
 5     3    46 >=16        2                 10 >=16        2                13      45       49        1
 6     3    46 >=16        3                 74 >=16        3                74      45       49        0
 7     4    63 <16         1                 24 <16         1                24      60       64        0
 8     4    63 <16         2                 11 <16         2                10      60       64        0
 9     4    63 <16         3                 78 <16         3                71      60       64        0
10     5    58 <16         1                 22 <16         1                24      55       59        1
11     5    58 <16         2                 14 <16         2                10      55       59        0
12     5    58 <16         3                 78 <16         3                71      55       59        0
like image 99
Ben Avatar answered Nov 11 '22 23:11

Ben