Creating a sequence in a data.table depending on a column

Tags:

1 Answers

Here is an option:

set.seed(1)
DT <- data.table(R=sample(0:1, 10000, rep=TRUE), Seq=0L)
DT[, Seq:=seq(.N), by=list(cumsum(c(0, abs(diff(R)))))]
DT

We create a counter that increments every time your 0-1 variable changes using cumsum(abs(diff(R))). The c(0, part is to ensure we get the correct length vector. Then we split by it with by. This produces:

       R Seq
    1: 0   1
    2: 0   2
    3: 1   1
    4: 1   2
    5: 0   1
   ---      
 9996: 1   1
 9997: 0   1
 9998: 1   1
 9999: 1   2
10000: 1   3

EDIT: Addressing request for clarification:

lets look at the computation I'm using in by, broken down into two new columns:

DT[, diff:=c(0, diff(R))]
DT[, cumsum:=cumsum(abs(diff))]
print(DT, topn=10)

Produces:

       R Seq diff cumsum
    1: 0   1    0      0
    2: 0   2    0      0
    3: 1   1    1      1
    4: 1   2    0      1
    5: 0   1   -1      2
    6: 1   1    1      3
    7: 1   2    0      3
    8: 1   3    0      3
    9: 1   4    0      3
   10: 0   1   -1      4
   ---                  
 9991: 1   2    0   5021
 9992: 1   3    0   5021
 9993: 1   4    0   5021
 9994: 1   5    0   5021
 9995: 0   1   -1   5022
 9996: 1   1    1   5023
 9997: 0   1   -1   5024
 9998: 1   1    1   5025
 9999: 1   2    0   5025
10000: 1   3    0   5025

You can see how the cumulative sum of the absolute of the diff increments by one each time R changes. We can then use that cumsum column to break up the data.table into chunks, and for each chunk, generate a sequence using seq(.N) that counts to the number of items in the chunk (.N represents exactly that, how many items in each by group).

196

answered Oct 04 '22 18:10

BrodieG

Related questions
                            
                                ggplot2: barplot with colors as a function of y-axis value
                            
                                Scraping data from tables on multiple web pages in R (football players)
                            
                                How to perform clustering without removing rows where NA is present in R
                            
                                How to configure R-3.0.1 with --enable-R-shlib [duplicate]
                            
                                fread segfault with 30GB space separated file with some rows starting with space
                            
                                Parallel execution of train in caret fails with function not found
                            
                                When and why is crossover beneficial in differential evolution?
                            
                                Create ggmap with points, faceted, and each facet zoomed appropriately?
                            
                                Why does rbindlist not respect column names?
                            
                                Simultaneous variable assignment and printing
                            
                                ggplot2, geom_boxplot with custom quantiles and outliers
                            
                                calculating the euclidean dist between each row of a dataframe with all other rows in another dataframe
                            
                                Why does my linear regression fit line look wrong?
                            
                                splinefun with method='fmm'
                            
                                How to determine if a url object returns '404 Not Found'?
                            
                                Environments in R Shiny
                            
                                Create_Analytics in RTextTools
                            
                                Installing multiple versions of R
                            
                                R - Taylor Diagram Plotting
                            
                                performing something similar to rbind in dplyr or tidyr?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating a sequence in a data.table depending on a column

Tags:

r

data.table

wizard_draziw

People also ask

1 Answers

BrodieG

Recent Activity

Donate For Us