Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

count occurrences in unique group combination

I have a data set that resembles the below:

SSN Auto    MtgHe   Personal    Other   None
A   1           1    0          0       0
B   1           1    0          0       0
C   1           0    0          0       0
D   1           0    1          1       0
E   0           0    0          0       1
F   0           0    0          0       1
G   0           0    0          0       1

SSN is the person, Auto, MtgHe, Personal, Other are loan categories and 'None' means no loans present. There are 15 total unique possible loan combinations plus 1 other possibility of 'None' which represents no loans present. So a person could have only an Auto loan, or an Auto and Personal loan, or no loan at all for example. I would like a count of SSNs that have each different combination. Using the table above the results would look like:

Cnt Auto    MtgHe   Personal    Other   None
2   1           1    0          0       0
1   1           0    0          0       0
1   1           0    1          1       0
3   0           0    0          0       1

Any ideas on how to accomplish this in R? My data set really has tens of thousands of cases, but any help would be appreciated.

like image 462
user3067851 Avatar asked Jan 07 '15 22:01

user3067851


2 Answers

And the obligatory data.table version (the only one that won't reorder the data set)

library(data.table)
setDT(df)[, .(Cnt = .N), .(Auto, MtgHe, Personal, Other, None)]
#    Auto MtgHe Personal Other None Cnt
# 1:    1     1        0     0    0   2
# 2:    1     0        0     0    0   1
# 3:    1     0        1     1    0   1
# 4:    0     0        0     0    1   3

Or a shorter version could be

temp <- names(df)[-1]
setDT(df)[, .N, temp]
#    Auto MtgHe Personal Other None N
# 1:    1     1        0     0    0 2
# 2:    1     0        0     0    0 1
# 3:    1     0        1     1    0 1
# 4:    0     0        0     0    1 3

And just for fun, here's another (unordered) base R version

Cnt <- rev(tapply(df[,1], do.call(paste, df[-1]), length))
cbind(unique(df[-1]), Cnt)
#   Auto MtgHe Personal Other None Cnt
# 1    1     1        0     0    0   2
# 3    1     0        0     0    0   1
# 4    1     0        1     1    0   1
# 5    0     0        0     0    1   3

And an additional dplyr version for completness

library(dplyr)
group_by(df, Auto, MtgHe, Personal, Other, None) %>% tally
# Source: local data frame [4 x 6]
# Groups: Auto, MtgHe, Personal, Other
# 
#   Auto MtgHe Personal Other None n
# 1    0     0        0     0    1 3
# 2    1     0        0     0    0 1
# 3    1     0        1     1    0 1
# 4    1     1        0     0    0 2
like image 50
David Arenburg Avatar answered Sep 30 '22 19:09

David Arenburg


One option, using dplyr's count function:

library(dplyr)
count(df, Auto, MtgHe, Personal, Other, None) %>% ungroup()
#Source: local data frame [4 x 6]
#
#  Auto MtgHe Personal Other None n
#1    0     0        0     0    1 3
#2    1     0        0     0    0 1
#3    1     0        1     1    0 1
#4    1     1        0     0    0 2

And for those who prefer base R and without ordering:

x <- interaction(df[-1])
df <- transform(df, n = ave(seq_along(x), x, FUN = length))[!duplicated(x),-1]
#  Auto MtgHe Personal Other None n
#1    1     1        0     0    0 2
#3    1     0        0     0    0 1
#4    1     0        1     1    0 1
#5    0     0        0     0    1 3
like image 28
talat Avatar answered Sep 30 '22 20:09

talat