I know this can be achieved with other packages, but I'm trying to do it in data.table
(as it seems to be the fastest for grouping).
library(data.table)
dt = data.table(a=c(1,2,2,3))
dt[,length(a),by=a]
results in
a V1
1: 1 1
2: 2 1
3: 3 1
whereas
df = data.frame(a=c(1,2,2,3))
ddply(df,.(a),summarise,V1=length(a))
produces
a V1
1 1 1
2 2 2
3 3 1
which is a more sensible results. Just wondering why data.table
is not giving the same results, and how this can be achieved.
The data.table way to do this is to use special variable, .N
, which keeps track of the number of rows in the current group. (Other special variables include .SD
, .BY
(in version 1.8.2) and .I
and .GRP
(available from version 1.8.3). All are documented in ?data.table
):
library(data.table)
dt = data.table(a=c(1,2,2,3))
dt[, .N, by = a]
# a N
# 1: 1 1
# 2: 2 2
# 3: 3 1
To see why what you tried didn't work, run the following, checking the value of a
and length(a)
at each browser prompt:
dt[, browser(), by = a]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With