I want to count and aggregate(sum) a column in a data.table
, and couldn't find the most efficient way to do this. This seems to be close to what I want R summarizing multiple columns with data.table.
My data:
set.seed(321) dat <- data.table(MNTH = c(rep(201501,4), rep(201502,3), rep(201503,5), rep(201504,4)), VAR = sample(c(0,1), 16, replace=T)) > dat MNTH VAR 1: 201501 1 2: 201501 1 3: 201501 0 4: 201501 0 5: 201502 0 6: 201502 0 7: 201502 0 8: 201503 0 9: 201503 0 10: 201503 1 11: 201503 1 12: 201503 0 13: 201504 1 14: 201504 0 15: 201504 1 16: 201504 0
I want to both count and sum VAR
by MNTH
using data.table. The desired result:
MNTH COUNT VAR 1 201501 4 2 2 201502 3 0 3 201503 5 2 4 201504 4 2
In order to use the aggregate function for mean in R, you will need to specify the numerical variable on the first argument, the categorical (as a list) on the second and the function to be applied (in this case mean ) on the third. An alternative is to specify a formula of the form: numerical ~ categorical .
To counts all of the rows in a table, whether they contain NULL values or not, use COUNT(*). That form of the COUNT() function basically returns the number of rows in a result set returned by a SELECT statement.
Think of .N as a variable for the number of instances. For example: dt <- data.table(a = LETTERS[c(1,1:3)], b = 4:7) dt[.N] # returns the last row # a b # 1: C 7.
The post you are referring to gives a method on how to apply one aggregation method to several columns. If you want to apply different aggregation methods to different columns, you can do:
dat[, .(count = .N, var = sum(VAR)), by = MNTH]
this results in:
MNTH count var 1: 201501 4 2 2: 201502 3 0 3: 201503 5 2 4: 201504 4 2
You can also add these values to your existing dataset by updating your dataset by reference:
dat[, `:=` (count = .N, var = sum(VAR)), by = MNTH]
this results in:
> dat MNTH VAR count var 1: 201501 1 4 2 2: 201501 1 4 2 3: 201501 0 4 2 4: 201501 0 4 2 5: 201502 0 3 0 6: 201502 0 3 0 7: 201502 0 3 0 8: 201503 0 5 2 9: 201503 0 5 2 10: 201503 1 5 2 11: 201503 1 5 2 12: 201503 0 5 2 13: 201504 1 4 2 14: 201504 0 4 2 15: 201504 1 4 2 16: 201504 0 4 2
For further reading about how to use data.table syntax, see the Getting started guides on the GitHub wiki.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With