How to change a variable at the nth occurrence of a value in another variable?

Tags:

data.table

There is a data.table

library(data.table)
car <- data.table(no = 1:100, turn = sample(1:5,100,replace = TRUE), 
              dis = sample(1:10,100,replace = TRUE))

I want to change "dis" to -1, at the nth occurrence of turn == 3, say the third time that "turn" is 3.

I can select the third row of turn == 3:

car[turn == 3, .SD[3]]

However, I don't manage to update "dis" at this row:

car[turn == 3, .SD[3]][, dis := -1]

A related Q&A: Conditionally replacing column values with data.table.

406

asked Aug 28 '17 08:08

1 Answers

Some alternatives. Use rowid or cumsum to create a counter of rows within groups. Add the counter to your condition in i.

I use a slightly smaller toy data set, just to make it easier to track the changes:

d <- data.table(x = 1:3, y = 1:12)

d[rowid(x) == 3 & x == 3, y := -1]

# @mt1022
d[cumsum(x == 3) == 3 & (x == 3), y := -1]

# @docendo discimus
d[(ix <- x == 3) & cumsum(ix) == 3, y := -1]

Although OP didn't mention speed as an issue, I was still curious to time the different approaches on a larger vector. Unsurprisingly, @Frank's method is the fastest, especially so when the number of unique values to search among increases:

frank << docendo < henrik < mt022

microbenchmark(henrik = d[rowid(x) == 3 & x == 3, y := -1],
               mt1022 = d[cumsum(x == 3) == 3 & (x == 3), y := -1],
               docendo = d[(ix <- x == 3) & cumsum(ix) == 3, y := -1],
               frank = d[d[x == 3, which = TRUE][3], y := -1], unit = "relative")

d <- data.table(x = sample(1:3, 1e6, replace = TRUE), y = 1:1e6)
# Unit: relative
#    expr      min       lq     mean   median       uq      max neval cld
#  henrik 4.417303 4.369407 4.133514 4.319839 4.329658 1.260394   100  b 
#  mt1022 5.461961 5.285562 5.174559 5.186404 5.239738 1.608712   100   c
# docendo 3.572646 3.624369 3.788678 3.589705 3.576637 1.733272   100  b 
#   frank 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000   100 a 

d <- data.table(x = sample(1:30, 1e6, replace = TRUE), y = 1:1e6)
# Unit: relative
#    expr      min       lq     mean   median       uq      max neval cld
#  henrik 22.64881 19.54375 18.81963 18.91335 19.78559 5.507692   100  bc
#  mt1022 24.58258 21.17535 19.84417 20.96256 22.76020 3.625263   100   c
# docendo 19.40044 16.75912 16.23321 16.47953 18.06264 4.234100   100  b 
#   frank  1.00000  1.00000  1.00000  1.00000  1.00000 1.000000   100 a

d <- data.table(x = sample(1:300, 1e6, replace = TRUE), y = 1:1e6)
# Unit: relative
#    expr      min       lq     mean   median       uq       max neval cld
#  henrik 31.81237 32.51122 28.79490 30.35766 28.63560  8.236282   100  b 
#  mt1022 34.71984 35.45341 33.20405 33.57394 31.50914 21.556367   100   c
# docendo 27.99046 28.15855 26.56954 26.60644 25.20044  7.847163   100  b 
#   frank  1.00000  1.00000  1.00000  1.00000  1.00000  1.000000   100 a

# Unit: milliseconds
#    expr       min        lq      mean    median       uq        max neval cld
#  henrik 60.655582 76.455531 83.061266 77.632036 78.57818 203.224042   100   c
#  mt1022 66.701182 84.133034 87.967300 84.937201 85.72464 201.167914   100   c
# docendo 52.938545 67.214360 71.558130 68.003891 68.51897 184.178346   100  b 
#   frank  1.977821  2.494039  2.629852  2.663577  2.76089   3.613905   100 a

156

answered Sep 17 '22 22:09

Henrik

Related questions
                            
                                How to remove more than 2 consecutive NA's in a column?
                            
                                Compilation error using Rcpp with typedef
                            
                                Change parameter values at time step in deSolve
                            
                                Plot the positive infinity symbol and negative infinity symbol
                            
                                Draw a map of a specific country with leaflet
                            
                                ggplot not plotting the correct color [duplicate]
                            
                                Multiply each column of a data frame by the corresponding value of a vector [duplicate]
                            
                                Change the input value in shiny from server
                            
                                Efficient way to insert data frame from R to SQL
                            
                                Read the file created/modified last in different directories in R
                            
                                Numerical Triple Integration in R
                            
                                R ODBC - Querying Column name with spaces
                            
                                Extract rows that have duplicates for certain column but are unique in another column
                            
                                how to replace a character INSIDE the text content of many files automatically?
                            
                                Why doesn't dplyr filter() work within function (i.e. using variable for column name)?
                            
                                run R as administrator
                            
                                R: merge based on multiple conditions (with non-equal criteria)
                            
                                ggplot2 proportional squares
                            
                                How to use bigrams and trigrams using tidy text
                            
                                Horizontal scrolling in Rmarkdown

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to change a variable at the nth occurrence of a value in another variable?

Tags:

r

data.table

Gauss.Y

People also ask

1 Answers

Henrik

Recent Activity

Donate For Us