I'd like to use data.table
but would like skip the calculation of the j
part if the by
corresponds to missing (NA
):
Here is an example data.table
library(data.table)
DT <- data.table(y=10, g=c(1,1,1,2,2,2,2,2,NA,NA))
It looks like this
> DT
y g
1: 10 1
2: 10 1
3: 10 1
4: 10 2
5: 10 2
6: 10 2
7: 10 2
8: 10 2
9: 10 NA
10: 10 NA
Now I'd like to do the by=
on g
and the two rows 9 and 10 will be lumped together because they have the same value NA
.
> DT[,.N, by=g]
g N
1: 1 3
2: 2 5
3: NA 2
I'd like to keep the NA
line in the output but would want to skip the calculate part in the result, ie., get the output, where N
is empty when g
is NA
> DT[,.N, by=g]
g N
1: 1 3
2: 2 5
3: NA NA
I thought I could access the value of g
through .GRP
but that only gives the group index and not the value. Is it possible to make the calculation conditional on the missing status of the by
variable?
You may try this one:
DT[, .N * NA^is.na(g), by = g]
g V1 1: 1 3 2: 2 5 3: NA NA
It is an algebraic version of Henrik's if ... else ...
clause.
It uses the fact that NA^0
returns 1
while NA^1
returns NA
and that FALSE
and TRUE
can be coerced to 0
and 1
, resp.
If you want to control the column name:
DT[, .(n = .N * NA^is.na(g)), by = g]
g n 1: 1 3 2: 2 5 3: NA NA
Alternatively, if above appears to tricky you can resort to data.table chaining (thanks to Sotos for bringing this up):
DT[, .N, by = g][is.na(g), N := NA][]
This will change the value of N
after aggregation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With