Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a contingency table using multiple columns in a data frame in R

Tags:

r

contingency

I have a data frame which looks like this:

structure(list(ab = c(0, 1, 1, 1, 1, 0, 0, 0, 1, 1), bc = c(1, 
1, 1, 1, 0, 0, 0, 1, 0, 1), de = c(0, 0, 1, 1, 1, 0, 1, 1, 0, 
1), cl = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 2)), .Names = c("ab", "bc", 
"de", "cl"), row.names = c(NA, -10L), class = "data.frame")

The column cl indicates a cluster association and the variables ab,bc & de carry binary answers, where 1 indicates yes and 0 - No.

I am trying to create a table cross tabbing cluster along with all the other columns in the data frame viz ab, bc and de, where the clusters become column variables. The desired output is like this

    1  2  3
 ab 1  3  2
 bc 2  3  1
 de 2  3  1

I tried the following code:

with(newdf, tapply(newdf[,c(3)], cl, sum))

This provides me values cross tabbing only one column at a time. My data frame has 1600+ columns with 1 cluster column. Can someone help?

like image 728
Apricot Avatar asked Oct 31 '15 19:10

Apricot


2 Answers

One way using dplyr would be:

library(dplyr)
df %>% 
  #group by the varialbe cl
  group_by(cl) %>%
  #sum every column
  summarize_each(funs(sum)) %>%
  #select the three needed columns
  select(ab, bc, de) %>%
  #transpose the df
  t

Output:

   [,1] [,2] [,3]
ab    1    3    2
bc    2    3    1
de    2    3    1
like image 82
LyzandeR Avatar answered Sep 17 '22 13:09

LyzandeR


In base R:

t(sapply(data[,1:3],function(x) tapply(x,data[,4],sum)))
#   1 2 3
#ab 1 3 2
#bc 2 3 1
#de 2 3 1
like image 38
nicola Avatar answered Sep 20 '22 13:09

nicola