Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concordance matrix

Tags:

r

tidyverse

Assume I have a data set with an arbitrary number of rows and columns like shown below.

tmp <- tibble(id = 1:10,
              v1 = c(0, 0, 0, 1, 1, 0, 0, 0, 0, 1),
              v2 = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 1),
              v3 = c(0, 0, 0, 1, 0, 0, 0, 0, 1, 0),
              v4 = c(0, 0, 0, 1, 1, 0, 0, 0, 1, 0))

Each row is a response. The respondent has either said yes (1) or no (0) to a specific question. Here, we have 4 questions.

What is the easiest way to convert this into a concordance matrix like below:

   v1  v2  v3  v4
v1 3   2   1   1
v2 2   2   1   1
v3 1   1   2   2
v4 1   1   2   3

Where each cell shows of those who answers yes to the question on the row, how many also answered yes to the question on the column.

Please note that the number of questions maybe bigger than 4, so I prefer not to have to hard code variable names in the solution. I can make sure the variable names always follow a specific format if that is helpful. A solution that doesn't care about variable names is ideal (we can drop the id column if needed).

like image 506
Merik Avatar asked Aug 31 '25 20:08

Merik


1 Answers

The easiest way is with matrix multiplication...

mx <- as.matrix(tmp[,-1])

t(mx) %*% mx

   v1 v2 v3 v4
v1  3  2  1  2
v2  2  2  1  1
v3  1  1  2  2
v4  2  1  2  3

crossprod(mx) will do the same thing.

like image 119
Andrew Gustar Avatar answered Sep 08 '25 18:09

Andrew Gustar