Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of records and generate row number within each group in a data.table

Tags:

r

data.table

I have the following data.table

set.seed(1) DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))     VAL  1:   1  2:   2  3:   2  4:   3  5:   1  6:   3  7:   3  8:   2  9:   2 10:   1 

Within each number in VAL I want to:

  1. Count the number of records/rows
  2. Create an row index (counter) of first, second, third occurrence et c.

At the end I want the result

    VAL COUNT IDX  1:   1     3   1  2:   2     4   1  3:   2     4   2  4:   3     3   1  5:   1     3   2  6:   3     3   2  7:   3     3   3  8:   2     4   3  9:   2     4   4 10:   1     3   3 

where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".

I tried to work with which and length using .I:

 dt[, list(COUNT = length(VAL == VAL[.I]),               IDX = which(which(VAL == VAL[.I]) == .I))] 

but this does not work as .I refers to a vector with the index, so I guess one must use .I[]. Though inside .I[] I again face the problem, that I do not have the row index and I do know (from reading data.table FAQ and following the posts here) that looping through rows should be avoided if possible.

So, what's the data.table way?

like image 338
Simon Z. Avatar asked Nov 08 '13 21:11

Simon Z.


People also ask

Which function counts the number of rows in a group?

The COUNT(*) function returns the number of rows in a table, including the rows including NULL and duplicates.

How do you count rows in a data table?

data(). rows(). count() gives you the count of rows.

How do I count the number of rows in each group in R?

The count() method can be applied to the input dataframe containing one or more columns and returns a frequency count corresponding to each of the groups.

How do I count the number of rows in each column?

If you need a quick way to count rows that contain data, select all the cells in the first column of that data (it may not be column A). Just click the column header. The status bar, in the lower-right corner of your Excel window, will tell you the row count.


1 Answers

Using .N...

DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ] #    VAL COUNT IDX # 1:   1     3   1 # 2:   2     4   1 # 3:   2     4   2 # 4:   3     3   1 # 5:   1     3   2 # 6:   3     3   2 # 7:   3     3   3 # 8:   2     4   3 # 9:   2     4   4 #10:   1     3   3 

.N is the number of records in each group, with groups defined by "VAL".

like image 105
Simon O'Hanlon Avatar answered Sep 19 '22 00:09

Simon O'Hanlon