Let's consider the following data frame:
set.seed(123)
data <- data.frame(col1 = factor(rep(c("A", "B", "C"), 4)),
                   col2 = factor(c(rep(c("A", "B", "C"), 3), c("A", "A", "A"))),
                   val1 = 1:12,
                   val2 = rnorm(12, 10, 15))
The contingency table is as follows:
cont_tab <- table(data$col1, data$col2, dnn = c("col1", "col2"))
cont_tab
    col2
col1 A B C
   A 4 0 0
   B 1 3 0
   C 1 0 3
As you can see some pairs didn't occur: (A,B), (A,C), (B,C), (C,B). The end goal of my analysis is to list all of the pairs (in this case 9) and show a statistic for each of them. While using dplyr::group_by() function I hit a limitation. Namely, the dplyr::group_by() considers only existing pairs (pairs that occured at least once):
data %>%
  group_by(col1, col2) %>%
  summarize(stat = sum(val2) - sum(val1))
# A tibble: 5 x 3
# Groups:   col1 [?]
  col1  col2   stat
  <fct> <fct> <dbl>
1 A     A      58.1
2 B     A     -16.4
3 B     B      17.0
4 C     A     -12.9
5 C     C     -41.9
The output I have in mind has 9 rows (4 of which has stat equal to 0). Is it doable in dplyr?
EDIT: Sorry for being too vague at the beginning. The real problem is more complex than counting the number of times a particular pair occurs. I added the new data in order to make the real problem more visible.
It is much easier to add spread from tidyr to get the same result as with table
library(dplyr)
library(tidyr)
count(data, col1, col2) %>% 
      spread(col2, n, fill = 0)
# A tibble: 3 x 4
# Groups:   col1 [3]
#  col1      A     B     C
#  <fct> <dbl> <dbl> <dbl>
#1 A         4     0     0
#2 B         1     3     0
#3 C         1     0     3
NOTE: The group_by/summarise step is changed to count here
As @divibisan suggested, if the OP wanted long format, then add gather at the end
data %>%
   group_by(col1, col2) %>%
   summarize(stat = n()) %>%
   spread(col2, stat, fill = 0) %>%
   gather(col2, stat, A:C)
# A tibble: 9 x 3
# Groups:   col1 [3]
#  col1  col2   stat
#  <fct> <chr> <dbl>
#1 A     A         4
#2 B     A         1
#3 C     A         1
#4 A     B         0
#5 B     B         3
#6 C     B         0
#7 A     C         0
#8 B     C         0
#9 C     C         3
With the updated data in OP's post
data %>%
   group_by(col1, col2) %>%
   summarize(stat = sum(val2) - sum(val1)) %>% 
   spread(col2, stat, fill = 0)  %>% 
   gather(col2, stat, -1)
# A tibble: 9 x 3
# Groups:   col1 [3]
#  col1  col2    stat
#  <fct> <chr>  <dbl>
#1 A     A       7.76
#2 B     A     -20.8 
#3 C     A       6.97
#4 A     B       0   
#5 B     B      28.8 
#6 C     B       0   
#7 A     C       0   
#8 B     C       0   
#9 C     C       9.56
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With