Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of instances where a variable or a combination of variables are TRUE

Tags:

for-loop

r

I'm an enthusiastic R newbie that needs some help! :)

I have a data frame that looks like this:

id<-c(100,200,300,400)
a<-c(1,1,0,1)
b<-c(1,0,1,0)
c<-c(0,0,1,1)

y=data.frame(id=id,a=a,b=b,c=c)

Where id is an unique identifier (e.g. a person) and a, b and c are dummy variables for whether the person has this feature or not (as always 1=TRUE).

I want R to create a matrix or data frame where I have the variables a, b and c both as the names of the columns and of the rows. For the values of the matrix R will have to calculate the number of identifiers that have this feature, or the combination of features.

So for example, IDs 100, 200 and 400 have feature a then in the diagonal of the matrix where a and a cross, R will input 3. Only ID 100 has both features a and b, hence R will input 1 where a and b cross, and so forth.

The resulting data frame will have to look like this:

l<-c("","a","b","c")
m<-c("a",3,1,1)
n<-c("b",1,2,1)
o<-c("c",1,1,2)
result<-matrix(c(l,m,n,o),nrow=4,ncol=4)

As my data set has 10 variables and hundreds of observations, I will have to automate the whole process.

Your help will be greatly appreciated. Thanks a lot!

like image 247
Nikolay Nenov Avatar asked Apr 05 '13 17:04

Nikolay Nenov


1 Answers

With base R:

crossprod(as.matrix(y[,-1]))
#   a b c
# a 3 1 1
# b 1 2 1
# c 1 1 2
like image 129
Josh O'Brien Avatar answered Sep 30 '22 11:09

Josh O'Brien