I have a data set like below. Each patient has 3 visits and they can transition between the 3 states from visit to visit.
ID <- c(1,1,1,2,2,2,3,3,3)
Visit <- c(1,2,3,1,2,3,1,2,3)
State <- c(2,1,1,3,2,1,2,3,1)
I want to make a data frame that count the number of transitions of states from visit 1 to visit 2. For Visit 1 to Visit 2, the matrix will be like: (the rows represent the state at visit 1, and the columns represent the state at visit 2. Entries on the diagonals
represent counts of participants who did not transition)

Although there is no harm in using other packages, this can be easily done using only table on base R (plus a minor step if the data is incomplete).
You probably have your data in a data.frame, so we'll build one from your sample data. I'll also make slight adjustments to the variables (IDs as letters, visits as "V1", "V2", etc.), for readability.
ddff <- data.frame(
ID = rep(c("A", "B", "C"), each = 3),
Visit = rep(c("V1", "V2", "V3"), 3),
State = paste0("S", c(2, 1, 1, 3, 2, 1, 2, 3, 1)))
If the dataset is complete, or if the missing values are explicit (i.e. if there is an explicit entry for each visit of each patient, even if the State is an NA), then it's a simple table is sufficient. We just need to turn State into a factor first, to make sure it isn't dropped, and we need to order the data.frame
ddff$State <- factor(ddff$State)
ddff <- ddff[order(ddff$ID, ddff$Visit), ]
table(ddff$State[ddff$Visit == "V1"],
ddff$State[ddff$Visit == "V2"],
dnn = c("V1", "V2"))
V2
V1 S1 S2 S3
S1 0 0 0
S2 1 0 1
S3 0 1 0
There will be non-zero values in the diagonal if any patients don't change state. E.g. for Visit 3 vs Visit 2:
table(ddff$State[ddff$Visit == "V2"],
ddff$State[ddff$Visit == "V3"],
dnn = c("V2", "V3"))
V3
V2 S1 S2 S3
S1 1 0 0
S2 1 0 0
S3 1 0 0
But if you really don't want them, you easily assign zeros to the diagonal:
tt <- table(ddff$State[ddff$Visit == "V2"],
ddff$State[ddff$Visit == "V3"],
dnn = c("V2", "V3"))
diag(tt) <- 0
tt
V3
V2 S1 S2 S3
S1 0 0 0
S2 1 0 0
S3 1 0 0
If there are missing values on the dataset, i.e. if there is not a line for each visit of each patient, the same approach can be used, but we need to fill in the missing data points by joining the data.frame with a combination of all possible IDs and visits.
First we'll drop V2 for patient B, to create an incomplete data.frame:
ddff2 <- ddff[-5, ]
ddff2
ID Visit State
1 A V1 S2
2 A V2 S1
3 A V3 S1
4 B V1 S3
5 B V3 S1
6 C V1 S2
7 C V2 S3
8 C V3 S1
Then we use expand.grid to create a data.frame with all possible combinations of ID and Visit, and then use merge to cross it with our data set. This will turn the implicit missing values into explicit missing values:
ddff2 <- merge(
ddff2,
expand.grid(ID = unique(ddff2$ID), Visit = unique(ddff2$Visit)),
all.y = T)
ddff2
ID Visit State
1 A V1 S2
2 A V2 S1
3 A V3 S1
4 B V1 S3
5 B V2 <NA>
6 B V3 S1
7 C V1 S2
8 C V2 S3
9 C V3 S1
We can now use the same approach as earlier:
table(ddff2$State[ddff2$Visit == "V1"],
ddff2$State[ddff2$Visit == "V2"],
dnn = c("V1", "V2"))
V2
V1 S1 S2 S3
S1 0 0 0
S2 1 0 1
S3 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With