I have a data.table dt
:
library(data.table)
dt = data.table(a=LETTERS[c(1,1:3)],b=4:7)
a b
1: A 4
2: A 5
3: B 6
4: C 7
The result of dt[, .N, by=a]
is
a N
1: A 2
2: B 1
3: C 1
I know the by=a
or by="a"
means grouped by a
column and the N
column is the sum of duplicated times of a
. However, I don't use nrow()
but I get the result. The .N
is not just the column name? I can't find the document by ??".N"
in R. I tried to use .K
, but it doesn't work. What does .N
means?
table's . N symbol, where . N stands for “number of rows.” It can be the total number of rows, or number of rows per group if you're aggregating in the “by” section. This expression returns the total number of rows in the data.table: mydt[, . N]
SD stands for "Subset of Data. table". The dot before SD has no significance but doesn't let it clash with a user-defined column name.
data. table(DT) is TRUE. To better description, I put parts of my original code here. So you may understand where goes wrong.
Think of .N
as a variable for the number of instances. For example:
dt <- data.table(a = LETTERS[c(1,1:3)], b = 4:7)
dt[.N] # returns the last row
# a b
# 1: C 7
Your example returns a new variable with the number of rows per case:
dt[, new_var := .N, by = a]
dt
# a b new_var
# 1: A 4 2 # 2 'A's
# 2: A 5 2
# 3: B 6 1 # 1 'B'
# 4: C 7 1 # 1 'C'
For a list of all special symbols of data.table, see also https://www.rdocumentation.org/packages/data.table/versions/1.10.0/topics/special-symbols
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With