Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating and Summing Matrix

Tags:

r

matrix

I'm relatively new to R, so forgive me for what I believe to be a relatively simple question.

I have data in the form

    1   2   3   4   5
A   0   1   1   0   0
B   1   0   1   0   1
C   0   1   0   1   0
D   1   0   0   0   0
E   0   0   0   0   1

where A-E are people and 1-5 are binaries of whether or not they have that quality. I need to make a matrix of A-E where cell A,B = 1 if the sum of any quality 1-5 for A & B sums to 2. (If they share at least one quality). The simple 5x5 would be:

    A   B   C   D   E
A   1               
B   1   1           
C   1   0   1       
D   0   1   0   1   
E   0   1   0   0   1

I then need to sum the entire matrix. (Above would be 9). I have thousands of observations, so I can't do this by hand. I'm sure there is an easy few lines of code, I'm just not experienced enough.

Thanks!

EDIT: I've imported the data from a .csv file with the columns (1-5 above) as variables, in the real data I have 40 variables. A-E are unique ID observations of people, approximately 2000. I would also like to know how to first convert this into a matrix, in order to execute the great answers you have already provided. Thanks!

like image 508
ChrisDH Avatar asked Apr 14 '15 19:04

ChrisDH


People also ask

What is a summing matrix?

In mathematics, matrix addition is the operation of adding two matrices by adding the corresponding entries together. However, there are other operations which could also be considered addition for matrices, such as the direct sum and the Kronecker sum.

How do you sum a matrix?

S = sum( A , dim ) returns the sum along dimension dim . For example, if A is a matrix, then sum(A,2) is a column vector containing the sum of each row. S = sum( A , vecdim ) sums the elements of A based on the dimensions specified in the vector vecdim .


1 Answers

You can use matrix multiplication here

out <- tcrossprod(m)
#   A B C D E
# A 2 1 1 0 0
# B 1 3 0 1 1
# C 1 0 2 0 0
# D 0 1 0 1 0
# E 0 1 0 0 1

Then set the diagonal to one, if required

diag(out) <- 1

As DavidA points out in comments tcrossprod is a basically doing m %*% t(m)

Several ways to them calculate the suml here is one

sum(out[upper.tri(out, diag=TRUE)] , na.rm=TRUE)
like image 152
user20650 Avatar answered Oct 25 '22 14:10

user20650