Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a column based on different group conditions

I have a dataset with different columns. Looks like that

df <- data.frame(PatientID = c("0002" ,"0004", "0005", "0006" ,"0009" ,"0010" ,"0018", "0019" ,"0020" ,"0027", "0039" ,"0041" ,"0042", "0043" ,"0044" ,"0045", "0046", "0047" ,"0048" ,"0049", "0055"),
                 A = c(987.805 , 977.146 , 790.809 , 964.315 ,1014.020 , 952.311 , 992.967 , 950.797 , 958.975  ,960.712  ,958.117 , 947.465 , 902.852 , 961.417,  985.124  ,994.178 , 930.141 ,1007.790 , 948.848, 1027.110 , 999.414),
                 B = c(998.988 , 972.606 , 998.680 , 955.037 , 972.941 ,1020.560 , 947.751 ,1029.560 , 955.540 , 911.606 , 964.039   ,    NA,  988.087 , 902.367 , 959.338 ,1029.050 , 925.162 , 987.374 ,1066.400  ,957.512 , 917.597),
                 C = c( 975.634 , 987.140 , 961.810 , 929.466 , 978.166, 1005.820  ,925.752 , 969.469 , 943.398  ,936.034,  965.292 , 996.404 , 920.610 , 967.047  ,986.565 , 913.517 , 893.428 , 921.606 , 976.192 , 929.590  ,950.493), 
D = c(975.634 , 987.140 , 961.810 , 929.466 , 978.166, 1005.820 , 925.752 , 969.469  ,943.398 , 936.034 , 965.292 , 996.404 , 920.610 , 967.047 , 986.565 , 913.517 , 893.428 , 921.606 , 976.192 , 929.590 , 950.493),
E = c(1006.330, 1028.070 , 975.554 , 954.274 ,1005.910  ,949.969 , 992.820 , 977.048  ,934.407 , 948.913 , 944.578 , 917.564 , 975.301,  961.375  ,955.296 , 961.128  ,998.119 ,1009.110 , 994.891 ,1000.170  ,982.763),
G= c(951.684 , 958.990 , 944.432 , 944.654 , 924.680 , 955.927 , 972.674 , 949.384  ,973.348 , 984.392 , 943.894 , 961.468 , 995.368 , 994.997 , 973.175 , 979.454 , 952.605 , 930.744  ,   NA, 1015.150 , 956.507), stringsAsFactors = F)

Basically what I need is to create an extra column that will be called above threshold, That will be TRUE/FALSE based on the following conditions:

To be TRUE, the patient need to have values above thresholds in any 3 out of the 6 columns (A, B, C, D, E or G), the thresholds are:

  • for A and B -> 990
  • for C and D -> 1000
  • for E and G -> 1005

otherwise is FALSE

Basically 3 or more columns need to be TRUE for the final column to be TRUE. The output would look like this (above threshold == TRUE are painted in green):

enter image description here

How could I set this up? - I hope this is clear, but ask away if is not!

Many thanks!

like image 940
Lili Avatar asked Mar 01 '23 13:03

Lili


1 Answers

We create a named list (or a named vector), loop across the columns other than 'PatientID', extract the list element with the column names (cur_column()), create a new logical column by adding suffix _new in the .names, then use rowSums to check if the number of TRUE per row is greater than or equal to 3 to create the 'above_threshold'

library(dplyr)
lst1 <- list(A = 990, B = 990, C = 1000, D = 1000, E = 1005, G = 1005)

df %>% 
    mutate(across(A:G,  ~ . > lst1[[cur_column()]],
       .names = '{.col}_new'), 
     above_threshold = rowSums(select(cur_data(), ends_with('new')), 
            na.rm = TRUE) >=3) %>%
    select(names(df), above_threshold)

-output

 PatientID        A        B        C        D        E        G above_threshold
1       0002  987.805  998.988  975.634  975.634 1006.330  951.684           FALSE
2       0004  977.146  972.606  987.140  987.140 1028.070  958.990           FALSE
3       0005  790.809  998.680  961.810  961.810  975.554  944.432           FALSE
4       0006  964.315  955.037  929.466  929.466  954.274  944.654           FALSE
5       0009 1014.020  972.941  978.166  978.166 1005.910  924.680           FALSE
6       0010  952.311 1020.560 1005.820 1005.820  949.969  955.927            TRUE
7       0018  992.967  947.751  925.752  925.752  992.820  972.674           FALSE
8       0019  950.797 1029.560  969.469  969.469  977.048  949.384           FALSE
9       0020  958.975  955.540  943.398  943.398  934.407  973.348           FALSE
10      0027  960.712  911.606  936.034  936.034  948.913  984.392           FALSE
11      0039  958.117  964.039  965.292  965.292  944.578  943.894           FALSE
12      0041  947.465       NA  996.404  996.404  917.564  961.468           FALSE
13      0042  902.852  988.087  920.610  920.610  975.301  995.368           FALSE
14      0043  961.417  902.367  967.047  967.047  961.375  994.997           FALSE
15      0044  985.124  959.338  986.565  986.565  955.296  973.175           FALSE
16      0045  994.178 1029.050  913.517  913.517  961.128  979.454           FALSE
17      0046  930.141  925.162  893.428  893.428  998.119  952.605           FALSE
18      0047 1007.790  987.374  921.606  921.606 1009.110  930.744           FALSE
19      0048  948.848 1066.400  976.192  976.192  994.891       NA           FALSE
20      0049 1027.110  957.512  929.590  929.590 1000.170 1015.150           FALSE
21      0055  999.414  917.597  950.493  950.493  982.763  956.507           FALSE
like image 126
akrun Avatar answered Mar 05 '23 17:03

akrun