My data looks like the following:
ID Diagnosis_1 Diagnosis_2 Diagnosis_3 Diagnosis_4
A 1 0 0 0
A 1 0 0 0
A 1 0 0 0
B 0 1 0 0
C 0 0 0 1
C 0 1 0 0
D 0 0 0 1
E 0 0 1 0
E 0 1 0 0
E 0 0 1 0
Diagnosis_1:Diagnosis_4 are all binary, representing presence (1) or absence (0) of the diagnosis. What I'd like to do is create a data frame that looks like this:
ID Diagnosis
A 1
A 1
A 1
B 2
C 4
C 2
D 4
E 3
E 2
E 3
No matter how many times I read the documentation on reshape/reshape2/tidyr I just can't manage to wrap my head around their implementation.
I can solve my problem using dplyr's mutate but it's a time-intensive, roundabout way to achieve my goal.
EDIT: Data edited to more realistically represent my actual data frame.
Try matrix multiplication:
nc <- ncol(DF)
data.frame(ID = DF$ID, Diagnosis = as.matrix(DF[-1]) %*% seq(nc-1))
giving:
ID Diagnosis
1 A 1
2 B 2
3 C 2
4 D 4
5 E 3
Note: We used this as input:
Lines <- "ID Diagnosis_1 Diagnosis_2 Diagnosis_3 Diagnosis_4
A 1 0 0 0
B 0 1 0 0
C 0 1 0 0
D 0 0 0 1
E 0 0 1 0"
DF <- read.table(text = Lines, header = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With