Frequency table including zeros for unused values, on a data.table

Tags:

I have a data set that is as follows:

library(data.table)

test <- data.table(structure(list(Issue.Date = structure(c(16041, 16056, 16042,15990, 15996, 16001, 15995, 15981, 15986, 15996, 15996, 16002,16015, 16020, 16025, 16032, 16023, 16084, 16077, 16102, 16104,16107, 16112, 16113, 16115, 16121, 16125, 16128, 16104, 16132,16133, 16135, 16139, 16146, 16151), class = "Date"), 
    Complaint = structure(c(1L,4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,5L, 3L, 1L, 3L, 1L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 3L,3L, 3L), .Label = c("A", "B", "C", "D", "E"), class = "factor"),
    yr = c("2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2013", "2013", "2013", "2013", "2013", "2013","2013", "2013", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014", "2014", "2014", "2014", "2014","2014", "2014", "2014", "2014"), 
    Month = c("2013-12", "2013-12","2013-12", "2013-10", "2013-10", "2013-10", "2013-10", "2013-10","2013-10", "2013-10", "2013-10", "2013-10", "2013-11", "2013-11","2013-11", "2013-11", "2013-11", "2014-01", "2014-01", "2014-02","2014-02", "2014-02", "2014-02", "2014-02", "2014-02", "2014-02","2014-02", "2014-02", "2014-02", "2014-03", "2014-03", "2014-03","2014-03", "2014-03", "2014-03"), 
    da = c("02", "17", "03","12", "18", "23", "17", "03", "08", "18", "18", "24", "06","11", "16", "23", "14", "14", "07", "01", "03", "06", "11","12", "14", "20", "24", "27", "03", "03", "04", "06", "10","17", "22")), 
   .Names = c("Issue.Date", "Complaint", "yr","Month", "da"), class = c("data.table", "data.frame"), row.names = c(NA,-35L)))

Basically what I would like to do is use data.table to create a frequency table that has Complaint and Count by Month. The trick is that I need it to show a Count of zero if there are no Complaints of that type for that Month. I know how to do it without showing the zeros, but I want to know how to include them.

test[ , count := .N, by = "Month,Complaint"]

782

asked May 08 '14 16:05

black_sheep07

2 Answers

To directly get the counts for each group:

setkey(test, Month, Complaint)

# may need to also add allow.cartesian, depending on actual data
test[CJ(Month, Complaint, unique = TRUE), .N, by = .EACHI]
#      Month Complaint N
# 1: 2013-10         A 0
# 2: 2013-10         B 0
# 3: 2013-10         C 5
# 4: 2013-10         D 4
# 5: 2013-10         E 0
# 6: 2013-11         A 1
# 7: 2013-11         B 0
# 8: 2013-11         C 4
# 9: 2013-11         D 0
#10: 2013-11         E 0
#11: 2013-12         A 1
#12: 2013-12         B 0
#13: 2013-12         C 0
#14: 2013-12         D 2
#15: 2013-12         E 0
#16: 2014-01         A 0
#17: 2014-01         B 0
#18: 2014-01         C 1
#19: 2014-01         D 0
#20: 2014-01         E 1
#21: 2014-02         A 2
#22: 2014-02         B 0
#23: 2014-02         C 6
#24: 2014-02         D 2
#25: 2014-02         E 0
#26: 2014-03         A 1
#27: 2014-03         B 2
#28: 2014-03         C 3
#29: 2014-03         D 0
#30: 2014-03         E 0
#      Month Complaint N

See first revision of the answer if you want to have the counts in the full data.table instead of summarizing.

179

answered Oct 26 '22 07:10

eddi

It sounds like you perhaps need to use expand.grid to "fill in" your data.table:

EG <- data.table(expand.grid(Complaint = unique(test$Complaint), 
                             Month = unique(test$Month)), 
                 key = "Complaint,Month")

Then, you can merge:

setkey(test, Complaint, Month)
Full <- merge(test, EG, all.y = TRUE)

And count like this:

Full[ , list(sum(!is.na(Issue.Date))), by = "Month,Complaint"]
#       Month Complaint V1
#  1: 2013-11         A  1
#  2: 2013-12         A  1
#  3: 2014-02         A  2
#  4: 2014-03         A  1
#  5: 2013-10         A  0
#  6: 2014-01         A  0
#  7: 2013-11         B  0
#  8: 2013-12         B  0
# ::: S N I P :::
# 24: 2014-01         D  0
# 25: 2013-11         E  0
# 26: 2013-12         E  0
# 27: 2014-02         E  0
# 28: 2014-03         E  0
# 29: 2013-10         E  0
# 30: 2014-01         E  1
#       Month Complaint V1

Alternatively, just use table (???)

data.table(table(test[, c("Month", "Complaint")]))
#       Month Complaint N
#  1: 2013-10         A 0
#  2: 2013-11         A 1
#  3: 2013-12         A 1
#  4: 2014-01         A 0
#  5: 2014-02         A 2
#  6: 2014-03         A 1
#  7: 2013-10         B 0
# ::: S N I P :::
# 28: 2014-01         E 1
# 29: 2014-02         E 0
# 30: 2014-03         E 0
#       Month Complaint N

answered Oct 26 '22 06:10

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Finding all functions in current workspace
                            
                                Print data frame with columns center-aligned
                            
                                How to add the total sums to the table and get proportion for each cell in R
                            
                                How to load only specific functions from a package
                            
                                Apply t-test on many columns in a dataframe split by factor
                            
                                Identify dates in the same week
                            
                                Fast melted data.table operations
                            
                                trouble adding geom_vline to ggplot2
                            
                                Set page width in Knitr for md or HTML output
                            
                                merge 3 data.frames by column names
                            
                                Mahalanobis distance in R
                            
                                Why does 1..99,999 == "1".."99,999" in R, but 100,000 != "100,000"?
                            
                                SpatialLinesDataFrame: how to calculate the min. distance between a point and a line
                            
                                2 Column Report in R Markdown - Render HTML aside Data Frame
                            
                                How to calculate any negative number to the power of some fraction in R?
                            
                                efficient use of R data.table and unique()
                            
                                Formatting reactive data.frames in Shiny
                            
                                Silence messages about masked functions
                            
                                Subtracting months - issue with last day of month?
                            
                                How to color entire background in ggplot2 when using coord_fixed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Frequency table including zeros for unused values, on a data.table

Tags:

r

data.table

frequency-distribution

black_sheep07

People also ask

2 Answers

eddi

A5C1D2H2I1M1N2O1R2T1

Recent Activity

Donate For Us