Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

capture column pattern frequency

I have a dataset like this below

Id        A      B       C
10        1      0       1
11        1      0       1
12        1      1       0
13        1      0       0
14        0      1       1

I am trying to count the column patterns like this below.

 Pattern         Count
 A, C            2
 A, B            1
 A               1
 B, C            1

Not sure where to start, any help or advice is much appreciated.

like image 888
Ty Voss Avatar asked Feb 06 '23 20:02

Ty Voss


2 Answers

If you don't have to group per ID then simply,

table(apply(df[-1], 1, function(i) paste(names(i[i == 1]), collapse = ',')))

#  A A,B A,C B,C 
#  1   1   2   1 
like image 135
Sotos Avatar answered Feb 08 '23 16:02

Sotos


Starting by "reversing" the tabulation of the data in the two separate vectors:

w = which(dat[-1] == 1L, TRUE)

we could use

table(tapply(names(dat)[-1][w[, "col"]], w[, "row"], paste, collapse = ", "))
#
#   A A, B A, C B, C 
#   1    1    2    1

If the result is not needed only for formatting purposes, to avoid unnecessary paste/strsplit, an alternative -among many- is:

pats = split(names(dat)[-1][w[, "col"]], w[, "row"])
upats = unique(pats)
data.frame(pat = upats, n = tabulate(match(pats, upats)))
#   pat n
#1 A, C 2
#3 A, B 1
#4    A 1
#5 B, C 1
like image 42
alexis_laz Avatar answered Feb 08 '23 15:02

alexis_laz