Assume I have a data set with an arbitrary number of rows and columns like shown below.
tmp <- tibble(id = 1:10,
v1 = c(0, 0, 0, 1, 1, 0, 0, 0, 0, 1),
v2 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 1),
v3 = c(0, 0, 0, 1, 0, 0, 0, 0, 1, 0),
v4 = c(0, 0, 0, 1, 1, 0, 0, 0, 1, 0))
Each row is a response. The respondent has either said yes (1) or no (0) to a specific question. Here, we have 4 questions.
What is the easiest way to convert this into a concordance matrix like below:
v1 v2 v3 v4
v1 3 2 1 1
v2 2 2 1 1
v3 1 1 2 2
v4 1 1 2 3
Where each cell shows of those who answers yes to the question on the row, how many also answered yes to the question on the column.
Please note that the number of questions maybe bigger than 4, so I prefer not to have to hard code variable names in the solution. I can make sure the variable names always follow a specific format if that is helpful. A solution that doesn't care about variable names is ideal (we can drop the id column if needed).
The easiest way is with matrix multiplication...
mx <- as.matrix(tmp[,-1])
t(mx) %*% mx
v1 v2 v3 v4
v1 3 2 1 2
v2 2 2 1 1
v3 1 1 2 2
v4 2 1 2 3
crossprod(mx)
will do the same thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With