Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Skip NA in data.table by




I'd like to use data.table but would like skip the calculation of the j part if the by corresponds to missing (NA):

Here is an example data.table

DT <- data.table(y=10, g=c(1,1,1,2,2,2,2,2,NA,NA))

It looks like this

> DT
     y  g
 1: 10  1
 2: 10  1
 3: 10  1
 4: 10  2
 5: 10  2
 6: 10  2
 7: 10  2
 8: 10  2
 9: 10 NA
10: 10 NA

Now I'd like to do the by= on g and the two rows 9 and 10 will be lumped together because they have the same value NA.

> DT[,.N, by=g]
    g N
1:  1 3
2:  2 5
3: NA 2

I'd like to keep the NA line in the output but would want to skip the calculate part in the result, ie., get the output, where N is empty when g is NA

> DT[,.N, by=g]
    g N
1:  1 3
2:  2 5
3: NA NA

I thought I could access the value of g through .GRP but that only gives the group index and not the value. Is it possible to make the calculation conditional on the missing status of the by variable?

like image 763
ekstroem Avatar asked Nov 22 '17 21:11


1 Answers

You may try this one:

DT[, .N * NA^is.na(g), by = g]
    g V1
1:  1  3
2:  2  5
3: NA NA

It is an algebraic version of Henrik's if ... else ... clause. It uses the fact that NA^0 returns 1 while NA^1 returns NA and that FALSE and TRUE can be coerced to 0 and 1, resp.

If you want to control the column name:

DT[, .(n = .N * NA^is.na(g)), by = g]
    g  n
1:  1  3
2:  2  5
3: NA NA

Alternatively, if above appears to tricky you can resort to data.table chaining (thanks to Sotos for bringing this up):

DT[, .N, by = g][is.na(g), N := NA][]

This will change the value of N after aggregation.

like image 66
Uwe Avatar answered Sep 18 '22 10:09
