I'd like to use data.table to do some wrangling and would like my resulting data table to not include the grouping variable.
Here's a MWE:
library("data.table")
DT <- data.table(x = 1:10, grp = rep(1:2,5))
DT[, .(mmm = mean(x)), by = grp]
This produces:
grp mmm
1: 1 5
2: 2 6
which is all fine. However, I'd prefer the grp not to be here. This can be fixed by chaining the data.table calls and setting grp := NULL or just throwing the variable away, but can I prevent it in the first call so I only return mmm?
It isn't clear why you don't want to use this. Using DT[, .(mmm = mean(x)), by = grp][, grp := NULL][] would be my first choice.
Although I won't advise it, you can also use:
DT[, .(mmm = DT[, .(mmm = mean(x)), by = grp]$mmm)]
which will give you the desired result as well:
mmm 1: 5 2: 6
Although you will get the same result, it is better not to use this method. The major drawback of this is that you will make your code unnecessary complicated when you want to summarise more than value column. You would then get something like:
DT[, .(mx = DT[, .(mx = mean(x)), by = grp]$mx, my = DT[, .(my = mean(y)), by = grp]$my)]
while using the normal data.table-way would be:
DT[, .(mx = mean(x), my = mean(y)), by = grp][, grp := NULL][]
To conclude:
Using the DT[, .(mmm = mean(x)), by = grp][, grp := NULL][] method would thus be your best choice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With