I have the following data.table
set.seed(1) DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE)) VAL 1: 1 2: 2 3: 2 4: 3 5: 1 6: 3 7: 3 8: 2 9: 2 10: 1
Within each number in VAL
I want to:
At the end I want the result
VAL COUNT IDX 1: 1 3 1 2: 2 4 1 3: 2 4 2 4: 3 3 1 5: 1 3 2 6: 3 3 2 7: 3 3 3 8: 2 4 3 9: 2 4 4 10: 1 3 3
where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".
I tried to work with which
and length
using .I
:
dt[, list(COUNT = length(VAL == VAL[.I]), IDX = which(which(VAL == VAL[.I]) == .I))]
but this does not work as .I
refers to a vector with the index, so I guess one must use .I[]
. Though inside .I[]
I again face the problem, that I do not have the row index and I do know (from reading data.table
FAQ and following the posts here) that looping through rows should be avoided if possible.
So, what's the data.table
way?
The COUNT(*) function returns the number of rows in a table, including the rows including NULL and duplicates.
data(). rows(). count() gives you the count of rows.
The count() method can be applied to the input dataframe containing one or more columns and returns a frequency count corresponding to each of the groups.
If you need a quick way to count rows that contain data, select all the cells in the first column of that data (it may not be column A). Just click the column header. The status bar, in the lower-right corner of your Excel window, will tell you the row count.
Using .N
...
DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ] # VAL COUNT IDX # 1: 1 3 1 # 2: 2 4 1 # 3: 2 4 2 # 4: 3 3 1 # 5: 1 3 2 # 6: 3 3 2 # 7: 3 3 3 # 8: 2 4 3 # 9: 2 4 4 #10: 1 3 3
.N
is the number of records in each group, with groups defined by "VAL"
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With