Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - count all combinations

I would like to count all combinations in a data.frame.

The data look like this

   9 10 11 12
1  1  1  1  1
2  0  0  0  0
3  0  0  0  0
4  1  1  1  1
5  1  1  1  1
6  0  0  0  0
7  1  0  0  1
8  1  0  0  1
9  1  1  1  1
10 1  1  1  1

The output I want is simply

comb     n 
1 1 1 1  5
0 0 0 0  3 
1 0 0 1  2 

Do you know any simple function to do that ?

Thanks

dt = structure(list(`9` = c(1, 0, 0, 1, 1, 0, 1, 1, 1, 1), `10` = c(1, 
0, 0, 1, 1, 0, 0, 0, 1, 1), `11` = c(1, 0, 0, 1, 1, 0, 0, 0, 
1, 1), `12` = c(1, 0, 0, 1, 1, 0, 1, 1, 1, 1)), .Names = c("9", 
"10", "11", "12"), class = "data.frame", row.names = c(NA, -10L
))
like image 274
giac Avatar asked Dec 16 '15 12:12

giac


People also ask

How do you count the number of combinations in R?

To find the count of unique group combinations in an R data frame, we can use count function of dplyr package along with ungroup function.

How do you count occurrences in R?

To count occurrences between columns, simply use both names, and it provides the frequency between the values of each column. This process produces a dataset of all those comparisons that can be used for further processing. It expands the variety a comparison you can make.

How do you find permutations and combinations in R?

Combinat package in R programming language can be used to calculate permutations and combinations of the numbers. It provides routines and methods to perform combinatorics. combn() method in R language belonging to this package is used to generate all combinations of the elements of x taken m at a time.

How do I count the number of occurrences by a group in R?

group_by() function along with n() is used to count the number of occurrences of the group in R. group_by() function takes “State” and “Name” column as argument and groups by these two columns and summarise() uses n() function to find count of a sales.


2 Answers

We can either use data.table or dplyr. These are very efficient. We convert the 'data.frame' to 'data.table' (setDT(dt)), grouped by all the columns of 'dt' (names(dt)), we get the nrow (.N) as the 'Count'

library(data.table)
setDT(dt)[,list(Count=.N) ,names(dt)]

Or we can use a similar methodology using dplyr.

library(dplyr)
names(dt) <- make.names(names(dt))
dt %>%
   group_by_(.dots=names(dt)) %>%
   summarise(count= n())

Benchmarks

In case somebody wants to look at some metrics (and also to backup my claim earlier (efficient!)),

set.seed(24)
df1 <- as.data.frame(matrix(sample(0:1, 1e6*6, replace=TRUE), ncol=6))

akrunDT <-  function() {
  as.data.table(df1)[,list(Count=.N) ,names(df1)]
 }

akrunDplyr <- function() {
  df1 %>%
    group_by_(.dots=names(df1)) %>%
    summarise(count= n())
}

cathG <- function() {
 aggregate(cbind(n = 1:nrow(df1))~., df1, length)
  }

docendoD <- function() {
  as.data.frame(table(comb = do.call(paste, df1)))
}

deena <- function() {
   table(apply(df1, 1, paste, collapse = ","))
}

Here are the microbenchmark results

library(microbenchmark)
microbenchmark(akrunDT(), akrunDplyr(), cathG(), docendoD(),  deena(),
  unit='relative', times=20L)
#   Unit: relative
#        expr       min        lq      mean   median        uq        max neval  cld
#     akrunDT()  1.000000  1.000000  1.000000  1.00000  1.000000  1.0000000    20     a   
#  akrunDplyr()  1.512354  1.523357  1.307724  1.45907  1.365928  0.7539773    20     a   
#       cathG() 43.893946 43.592062 37.008677 42.10787 38.556726 17.9834245    20    c 
#    docendoD() 18.778534 19.843255 16.560827 18.85707 17.296812  8.2688541    20    b  
#       deena() 90.391417 89.449547 74.607662 85.16295 77.316143 34.6962954    20    d
like image 143
akrun Avatar answered Oct 30 '22 01:10

akrun


The dplyr solution above could have been done easier with group_by_all()...

dt %>% group_by_all %>% count

...which as I understand has been superseded by the across() method. Adding in a bit of sorting, and you get:

dt %>% group_by(across()) %>% count %>% arrange(desc(n))

> dt %>% group_by(across()) %>% count %>% arrange(desc(n))
# A tibble: 3 x 5
# Groups:   9, 10, 11, 12 [3]
    `9`  `10`  `11`  `12`     n
  <dbl> <dbl> <dbl> <dbl> <int>
1     1     1     1     1     5
2     0     0     0     0     3
3     1     0     0     1     2

Which you could cast to a matrix if you wished.

like image 38
Mike Dolan Fliss Avatar answered Oct 30 '22 00:10

Mike Dolan Fliss