Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Frequency table including zeros for unused values, on a data.table

I have a data set that is as follows:

library(data.table)

test <- data.table(structure(list(Issue.Date = structure(c(16041, 16056, 16042,15990, 15996, 16001, 15995, 15981, 15986, 15996, 15996, 16002,16015, 16020, 16025, 16032, 16023, 16084, 16077, 16102, 16104,16107, 16112, 16113, 16115, 16121, 16125, 16128, 16104, 16132,16133, 16135, 16139, 16146, 16151), class = "Date"), 
    Complaint = structure(c(1L,4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,5L, 3L, 1L, 3L, 1L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 3L,3L, 3L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),
    yr = c("2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014"), 
    Month = c("2013-12", "2013-12","2013-12", "2013-10", "2013-10", "2013-10", "2013-10", "2013-10","2013-10", "2013-10", "2013-10", "2013-10", "2013-11", "2013-11","2013-11", "2013-11", "2013-11", "2014-01", "2014-01", "2014-02","2014-02", "2014-02", "2014-02", "2014-02", "2014-02", "2014-02","2014-02", "2014-02", "2014-02", "2014-03", "2014-03", "2014-03","2014-03", "2014-03", "2014-03"), 
    da = c("02", "17", "03","12", "18", "23", "17", "03", "08", "18", "18", "24", "06","11", "16", "23", "14", "14", "07", "01", "03", "06", "11","12", "14", "20", "24", "27", "03", "03", "04", "06", "10","17", "22")), 
   .Names = c("Issue.Date", "Complaint", "yr","Month", "da"), class = c("data.table", "data.frame"), row.names = c(NA,-35L)))

Basically what I would like to do is use data.table to create a frequency table that has Complaint and Count by Month. The trick is that I need it to show a Count of zero if there are no Complaints of that type for that Month. I know how to do it without showing the zeros, but I want to know how to include them.

test[ , count := .N, by = "Month,Complaint"]
like image 782
black_sheep07 Avatar asked May 08 '14 16:05

black_sheep07


People also ask

Do you count 0 in frequency distribution?

The frequencies of the first and last class must be greater than zero. The frequency of any other class may be zero. If you tallied correctly, the sum of all the frequencies should equal the total number of data values.

How do you represent data in a frequency table?

To make such a frequency distribution table, first, write the class intervals in one column. Next, tally the numbers in each category based on the number of times it appears. Finally, write the frequency in the final column. A frequency distribution table drawn above is called a grouped frequency distribution table.


2 Answers

To directly get the counts for each group:

setkey(test, Month, Complaint)

# may need to also add allow.cartesian, depending on actual data
test[CJ(Month, Complaint, unique = TRUE), .N, by = .EACHI]
#      Month Complaint N
# 1: 2013-10         A 0
# 2: 2013-10         B 0
# 3: 2013-10         C 5
# 4: 2013-10         D 4
# 5: 2013-10         E 0
# 6: 2013-11         A 1
# 7: 2013-11         B 0
# 8: 2013-11         C 4
# 9: 2013-11         D 0
#10: 2013-11         E 0
#11: 2013-12         A 1
#12: 2013-12         B 0
#13: 2013-12         C 0
#14: 2013-12         D 2
#15: 2013-12         E 0
#16: 2014-01         A 0
#17: 2014-01         B 0
#18: 2014-01         C 1
#19: 2014-01         D 0
#20: 2014-01         E 1
#21: 2014-02         A 2
#22: 2014-02         B 0
#23: 2014-02         C 6
#24: 2014-02         D 2
#25: 2014-02         E 0
#26: 2014-03         A 1
#27: 2014-03         B 2
#28: 2014-03         C 3
#29: 2014-03         D 0
#30: 2014-03         E 0
#      Month Complaint N

See first revision of the answer if you want to have the counts in the full data.table instead of summarizing.

like image 179
eddi Avatar answered Oct 26 '22 07:10

eddi


It sounds like you perhaps need to use expand.grid to "fill in" your data.table:

EG <- data.table(expand.grid(Complaint = unique(test$Complaint), 
                             Month = unique(test$Month)), 
                 key = "Complaint,Month")

Then, you can merge:

setkey(test, Complaint, Month)
Full <- merge(test, EG, all.y = TRUE)

And count like this:

Full[ , list(sum(!is.na(Issue.Date))), by = "Month,Complaint"]
#       Month Complaint V1
#  1: 2013-11         A  1
#  2: 2013-12         A  1
#  3: 2014-02         A  2
#  4: 2014-03         A  1
#  5: 2013-10         A  0
#  6: 2014-01         A  0
#  7: 2013-11         B  0
#  8: 2013-12         B  0
# ::: S N I P :::
# 24: 2014-01         D  0
# 25: 2013-11         E  0
# 26: 2013-12         E  0
# 27: 2014-02         E  0
# 28: 2014-03         E  0
# 29: 2013-10         E  0
# 30: 2014-01         E  1
#       Month Complaint V1

Alternatively, just use table (???)

data.table(table(test[, c("Month", "Complaint")]))
#       Month Complaint N
#  1: 2013-10         A 0
#  2: 2013-11         A 1
#  3: 2013-12         A 1
#  4: 2014-01         A 0
#  5: 2014-02         A 2
#  6: 2014-03         A 1
#  7: 2013-10         B 0
# ::: S N I P :::
# 28: 2014-01         E 1
# 29: 2014-02         E 0
# 30: 2014-03         E 0
#       Month Complaint N
like image 37
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 26 '22 06:10

A5C1D2H2I1M1N2O1R2T1