How to count the number of times an element appears consecutively in a data.table?

Tags:

r

data.table

I have a data.table that looks like this

ID, Order, Segment
1, 1, A
1, 2, B
1, 3, B
1, 4, C
1, 5, B
1, 6, B
1, 7, B
1, 8, B

Basically by ordering the data using the Order column. I would like to understand the number of consecutive B's for each of the ID's. Ideally the output I would like is

ID, Consec
1, 2
1, 4

Because the segment B appears consecutively in row 2 and 3 (2 times), and then again in row 5,6,7,8 (4 times).

The loop solution is quite obvious but would also be very slow.

Are there elegant solutions in data.table that is also fast?

P.S. The data I am dealing with has ~20 million rows.

708

asked Apr 15 '15 04:04

xiaodai

1 Answers

Try

 library(data.table)#v1.9.5+
  DT[order(ID, Order)][, indx:=rleid(Segment)][Segment=='B',
    list(Consec=.N), by = list(indx, ID)][,indx:=NULL][]

 #    ID Consec
 #1:  1      2
 #2:  1      4

Or as @eddi suggested

 DT[order(ID, Order)][, .(Consec = .N), by = .(ID, Segment, 
              rleid(Segment))][Segment == 'B', .(ID, Consec)]
 #    ID Consec
 #1:  1      2
 #2:  1      4

A more memory efficient method would be to use setorder instead of order (as suggested by @Arun)

  setorder(DT, ID, Order)[, .(Consec = .N), by = .(ID, Segment, 
                rleid(Segment))][Segment == 'B', .(ID, Consec)]
  #   ID Consec
  #1:  1      2
  #2:  1      4

100

answered Oct 21 '22 10:10

akrun

Related questions
                            
                                Repeating or looping an argument
                            
                                Calculate elapsed time since last event
                            
                                corrplot shows insignificant correlation coefficients even when insig = "blank" is set
                            
                                labeling row and col names when using dist() and as.matrix()
                            
                                R create adjacency matrix according to columns from data.frame
                            
                                Double for-loop operation in R (with an example)
                            
                                Overlap image plot on a Google Map background in R
                            
                                Is it okay to modify a mapped matrix in RcppEigen?
                            
                                R: Converting output from getSymbols() to data frame in one command without calling the object name explicitly
                            
                                How to vectorize this loop
                            
                                get bounding box from ggmap object
                            
                                How to convert foreach in Stata to R?
                            
                                R - Vertex attributes - 'Inappropriate value given in set.vertex.attribute.'
                            
                                R write dataframe column to csv having leading zeroes
                            
                                Why do R and statsmodels give slightly different ANOVA results?
                            
                                Formatted table output, printing into R console
                            
                                Count the occurrence of one vector's values in another vector
                            
                                sum by year in a row in a dataframe in r
                            
                                Unique string combinations
                            
                                Plot multiple series of data into a single bagplot with R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With