Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Frequency of each unique combination in data frame

Tags:

dataframe

r

In a dataset (N=6000) I would like to analyse how often combinations of (15 dummy)variables occur.

ID       Var1        Var2       Var3    Var15

1          1          0          0        1

2          0          1          1        1

3          1          0          0        0

6000       1          0          0        0

For this example what I would like to see is that the combination 1000 occurs twice, 1001 occurs once, and 0111 occurs also once.

The only way I can think up is compute a variable for each possible combination...

Is there an elegant and efficient way to do this?

I have read through How to summarize all possible combinations of variables? But that is a slightly different question and Aggregating Tally counters transcends my knowledge (but if that is the answer to my question, I will go through it).

like image 727
Sem Avatar asked Oct 18 '25 13:10

Sem


2 Answers

You can just use count like this:

df = read.table(text = "
ID       Var1        Var2       Var3    Var15
1          1          0          0        1
2          0          1          1        1
3          1          0          0        0
6000       1          0          0        0
", header=T)

library(dplyr)

df %>% count(Var1, Var2, Var3, Var15)

# # A tibble: 3 x 5
#     Var1  Var2  Var3 Var15     n
#    <int> <int> <int> <int> <int>
# 1     0     1     1     1     1
# 2     1     0     0     0     2
# 3     1     0     0     1     1

Or use count_ if you don't want to type (many) column names:

input_names = names(df)[-1]  # select all column names apart from 1st one

df %>% count_(input_names)

# # A tibble: 3 x 5
#    Var1  Var2  Var3 Var15     n
#   <int> <int> <int> <int> <int>
# 1     0     1     1     1     1
# 2     1     0     0     0     2
# 3     1     0     0     1     1

If you want to group your variables and create a single (combo) variable you can do this:

library(dplyr)
library(tidyr)

input_names = names(df)[-1]

df %>% count_(input_names) %>% unite_("ComboVar",input_names,sep="")

# # A tibble: 3 x 2
#   ComboVar     n
# * <chr>    <int>
# 1 0111         1
# 2 1000         2
# 3 1001         1
like image 196
AntoniosK Avatar answered Oct 21 '25 03:10

AntoniosK


Using the dplyr package, you could have:

library(dplyr)
df %>% group_by(Var1, Var2, Var3, Var15) %>% tally
like image 39
Constantinos Avatar answered Oct 21 '25 03:10

Constantinos



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!