I have a data set that resembles the below:
SSN Auto MtgHe Personal Other None
A 1 1 0 0 0
B 1 1 0 0 0
C 1 0 0 0 0
D 1 0 1 1 0
E 0 0 0 0 1
F 0 0 0 0 1
G 0 0 0 0 1
SSN is the person, Auto, MtgHe, Personal, Other are loan categories and 'None' means no loans present. There are 15 total unique possible loan combinations plus 1 other possibility of 'None' which represents no loans present. So a person could have only an Auto loan, or an Auto and Personal loan, or no loan at all for example. I would like a count of SSNs that have each different combination. Using the table above the results would look like:
Cnt Auto MtgHe Personal Other None
2 1 1 0 0 0
1 1 0 0 0 0
1 1 0 1 1 0
3 0 0 0 0 1
Any ideas on how to accomplish this in R? My data set really has tens of thousands of cases, but any help would be appreciated.
And the obligatory data.table
version (the only one that won't reorder the data set)
library(data.table)
setDT(df)[, .(Cnt = .N), .(Auto, MtgHe, Personal, Other, None)]
# Auto MtgHe Personal Other None Cnt
# 1: 1 1 0 0 0 2
# 2: 1 0 0 0 0 1
# 3: 1 0 1 1 0 1
# 4: 0 0 0 0 1 3
Or a shorter version could be
temp <- names(df)[-1]
setDT(df)[, .N, temp]
# Auto MtgHe Personal Other None N
# 1: 1 1 0 0 0 2
# 2: 1 0 0 0 0 1
# 3: 1 0 1 1 0 1
# 4: 0 0 0 0 1 3
And just for fun, here's another (unordered) base R version
Cnt <- rev(tapply(df[,1], do.call(paste, df[-1]), length))
cbind(unique(df[-1]), Cnt)
# Auto MtgHe Personal Other None Cnt
# 1 1 1 0 0 0 2
# 3 1 0 0 0 0 1
# 4 1 0 1 1 0 1
# 5 0 0 0 0 1 3
And an additional dplyr
version for completness
library(dplyr)
group_by(df, Auto, MtgHe, Personal, Other, None) %>% tally
# Source: local data frame [4 x 6]
# Groups: Auto, MtgHe, Personal, Other
#
# Auto MtgHe Personal Other None n
# 1 0 0 0 0 1 3
# 2 1 0 0 0 0 1
# 3 1 0 1 1 0 1
# 4 1 1 0 0 0 2
One option, using dplyr's count
function:
library(dplyr)
count(df, Auto, MtgHe, Personal, Other, None) %>% ungroup()
#Source: local data frame [4 x 6]
#
# Auto MtgHe Personal Other None n
#1 0 0 0 0 1 3
#2 1 0 0 0 0 1
#3 1 0 1 1 0 1
#4 1 1 0 0 0 2
And for those who prefer base R and without ordering:
x <- interaction(df[-1])
df <- transform(df, n = ave(seq_along(x), x, FUN = length))[!duplicated(x),-1]
# Auto MtgHe Personal Other None n
#1 1 1 0 0 0 2
#3 1 0 0 0 0 1
#4 1 0 1 1 0 1
#5 0 0 0 0 1 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With