Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use a factor column in "by" and do not drop empty factors

Tags:

r

data.table

Suppose I have a data.table:

x <- data.table(x=runif(3), group=factor(c('a','b','a'), levels=c('a','b','c')))

I want to know how many rows in x exist for each group:

x[, .N, by="group"]
#    group N
# 1:     a 2
# 2:     b 1

Question: is there some way to force the above by="group" to consider all levels of the factor group?

Notice how since I don't have any rows of with group 'c' in the table, I don't get a row for c.

Desired output:

x[, .N, by="group", ???] # somehow use all levels in `group`
#    group N
# 1:     a 2
# 2:     b 1
# 3:     c 0
like image 857
mathematical.coffee Avatar asked May 14 '13 11:05

mathematical.coffee


1 Answers

If you are willing to run through the factor levels by enumerating them in i (rather than by setting by="group"), this will get you the hoped for results.

setkey(x, "group")
x[levels(group), .N, by=.EACHI]
#    group N
# 1:     a 2
# 2:     b 1
# 3:     c 0
like image 180
Josh O'Brien Avatar answered Oct 06 '22 14:10

Josh O'Brien