I have this kind of dataset which many more variables, but I only choose to show a few:
dt <- data.table(ID = c(1,2,3, 4, 5),
diagnosis1 = c(0, 0, 1, 0, 1),
diagnosis2 = c(1, 0, 0, 1, 0),
diagnosis3 = c(0, 1, 1, 0, 1),
diagnosis4 = c(1, 0, 1, 0, 0))
There is 5 patients and 4 types of diagnosis. A patient could have a diagnosis1 but also diagnosis3 (fx patient 5), but in my final dataset each patient are only allowed one diagnosis. The priority list is this: diagnosis1, diagnosis2, diagnosis3, diagnosis4. So in this case the patient 5 should only get diagnosis 1.
I have a large dataset with multiple other variables than these 5 shown above. So the output should be the same, but replace the 1's that are not the chosen one to 0.
Hope you can help!
Create a vector of your column names, and then use min(which()
by ID:
specify the dx priority explicitly:
dx_priority = c("diagnosis3", "diagnosis1", "diagnosis4", "diagnosis2")
dt[, f_dx:=dx_priority[min(which(.SD==1))], ID, .SDcols = dx_priority]
let the order of the diagnosis columns dictate the priority
dx_cols <- names(dt)[ grepl("^diagnosis", names(dt)) ]
dt[, f_dx := dx_cols[min(which(.SD == 1))], by = ID, .SDcols = dx_cols ]
Output:
ID diagnosis1 diagnosis2 diagnosis3 diagnosis4 f_dx
1: 1 0 1 0 1 diagnosis2
2: 2 0 0 1 0 diagnosis3
3: 3 1 0 1 1 diagnosis1
4: 4 0 1 0 0 diagnosis2
5: 5 1 0 1 0 diagnosis1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With