I have the following data.table <pre class="prettyprint"><code>set.seed(1) DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE)) VAL 1: 1 2: 2 3: 2 4: 3 5: 1 6: 3 7: 3 8: 2 9: 2 10: 1 </code></pre> Within each number in <code>VAL</code> I want to: <ol> <li>Count the number of records/rows</li> <li>Create an row index (counter) of first, second, third occurrence et c. </li> </ol> At the end I want the result <pre class="prettyprint"><code> VAL COUNT IDX 1: 1 3 1 2: 2 4 1 3: 2 4 2 4: 3 3 1 5: 1 3 2 6: 3 3 2 7: 3 3 3 8: 2 4 3 9: 2 4 4 10: 1 3 3 </code></pre> where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL". I tried to work with <code>which</code> and <code>length</code> using <code>.I</code>: <pre class="prettyprint"><code> dt[, list(COUNT = length(VAL == VAL[.I]), IDX = which(which(VAL == VAL[.I]) == .I))] </code></pre> but this does not work as <code>.I</code> refers to a vector with the index, so I guess one must use <code>.I[]</code>. Though inside <code>.I[]</code> I again face the problem, that I do not have the row index and I do know (from reading <code>data.table</code> FAQ and following the posts here) that looping through rows should be avoided if possible. So, what's the <code>data.table</code> way?

Using <code>.N</code>... <pre class="prettyprint"><code>DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ] # VAL COUNT IDX # 1: 1 3 1 # 2: 2 4 1 # 3: 2 4 2 # 4: 3 3 1 # 5: 1 3 2 # 6: 3 3 2 # 7: 3 3 3 # 8: 2 4 3 # 9: 2 4 4 #10: 1 3 3 </code></pre> <code>.N</code> is the number of records in each group, with groups defined by <code>"VAL"</code>.

Count number of records and generate row number within each group in a data.table

Tags:

r

data.table

I have the following data.table

set.seed(1) DT <- data.table(VAL = sample(c(1, 2, 3), 10, replace = TRUE))     VAL  1:   1  2:   2  3:   2  4:   3  5:   1  6:   3  7:   3  8:   2  9:   2 10:   1

Within each number in VAL I want to:

Count the number of records/rows
Create an row index (counter) of first, second, third occurrence et c.

At the end I want the result

    VAL COUNT IDX  1:   1     3   1  2:   2     4   1  3:   2     4   2  4:   3     3   1  5:   1     3   2  6:   3     3   2  7:   3     3   3  8:   2     4   3  9:   2     4   4 10:   1     3   3

where "COUNT" is the number of records/rows for each "VAL", and "IDX" is the row index within each "VAL".

I tried to work with which and length using .I:

 dt[, list(COUNT = length(VAL == VAL[.I]),               IDX = which(which(VAL == VAL[.I]) == .I))]

but this does not work as .I refers to a vector with the index, so I guess one must use .I[]. Though inside .I[] I again face the problem, that I do not have the row index and I do know (from reading data.table FAQ and following the posts here) that looping through rows should be avoided if possible.

So, what's the data.table way?

338

asked Nov 08 '13 21:11

Simon Z.

1 Answers

Using .N...

DT[ , `:=`( COUNT = .N , IDX = 1:.N ) , by = VAL ] #    VAL COUNT IDX # 1:   1     3   1 # 2:   2     4   1 # 3:   2     4   2 # 4:   3     3   1 # 5:   1     3   2 # 6:   3     3   2 # 7:   3     3   3 # 8:   2     4   3 # 9:   2     4   4 #10:   1     3   3

.N is the number of records in each group, with groups defined by "VAL".

105

answered Sep 19 '22 00:09

Simon O'Hanlon

Related questions
                            
                                Replacing values from a column using a condition in R
                            
                                What does the R function `poly` really do?
                            
                                Force ggplot2 scatter plot to be square shaped
                            
                                Convert a numeric month to a month abbreviation
                            
                                Inserting an image to ggplot2
                            
                                How to subtract years?
                            
                                Replace missing values with column mean
                            
                                Use D3 and Shiny to implement `identify()` in R
                            
                                Interpolate product attributes
                            
                                Rd file name conflict when extending a S4 method of some other package
                            
                                Animated rgl graphs with knitr
                            
                                In R formulas, why do I have to use the I() function on power terms, like y ~ I(x^3)
                            
                                Convert data frame with date column to timeseries
                            
                                How can I make consistent-width plots in ggplot (with legends)?
                            
                                Parallelism in Julia: Native Threading Support
                            
                                Apply function to each column in a data frame observing each columns existing data type
                            
                                write.csv for large data.table
                            
                                Error in file(file, "rt") : cannot open the connection [duplicate]
                            
                                Function default arguments and named values
                            
                                Namespaces in R packages

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With