There is a data.table
library(data.table)
car <- data.table(no = 1:100, turn = sample(1:5,100,replace = TRUE),
dis = sample(1:10,100,replace = TRUE))
I want to change "dis" to -1, at the nth occurrence of turn == 3
, say the third time that "turn" is 3
.
I can select the third row of turn == 3
:
car[turn == 3, .SD[3]]
However, I don't manage to update "dis" at this row:
car[turn == 3, .SD[3]][, dis := -1]
A related Q&A: Conditionally replacing column values with data.table.
replace (old, new[, count]) -> string Return a copy of string S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
Use two parameter form of indexOf and start searching at previously found position + 1.
Some alternatives. Use rowid
or cumsum
to create a counter of rows within groups. Add the counter to your condition in i
.
I use a slightly smaller toy data set, just to make it easier to track the changes:
d <- data.table(x = 1:3, y = 1:12)
d[rowid(x) == 3 & x == 3, y := -1]
# @mt1022
d[cumsum(x == 3) == 3 & (x == 3), y := -1]
# @docendo discimus
d[(ix <- x == 3) & cumsum(ix) == 3, y := -1]
Although OP didn't mention speed as an issue, I was still curious to time the different approaches on a larger vector. Unsurprisingly, @Frank's method is the fastest, especially so when the number of unique values to search among increases:
frank << docendo < henrik < mt022
microbenchmark(henrik = d[rowid(x) == 3 & x == 3, y := -1],
mt1022 = d[cumsum(x == 3) == 3 & (x == 3), y := -1],
docendo = d[(ix <- x == 3) & cumsum(ix) == 3, y := -1],
frank = d[d[x == 3, which = TRUE][3], y := -1], unit = "relative")
d <- data.table(x = sample(1:3, 1e6, replace = TRUE), y = 1:1e6)
# Unit: relative
# expr min lq mean median uq max neval cld
# henrik 4.417303 4.369407 4.133514 4.319839 4.329658 1.260394 100 b
# mt1022 5.461961 5.285562 5.174559 5.186404 5.239738 1.608712 100 c
# docendo 3.572646 3.624369 3.788678 3.589705 3.576637 1.733272 100 b
# frank 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 a
d <- data.table(x = sample(1:30, 1e6, replace = TRUE), y = 1:1e6)
# Unit: relative
# expr min lq mean median uq max neval cld
# henrik 22.64881 19.54375 18.81963 18.91335 19.78559 5.507692 100 bc
# mt1022 24.58258 21.17535 19.84417 20.96256 22.76020 3.625263 100 c
# docendo 19.40044 16.75912 16.23321 16.47953 18.06264 4.234100 100 b
# frank 1.00000 1.00000 1.00000 1.00000 1.00000 1.000000 100 a
d <- data.table(x = sample(1:300, 1e6, replace = TRUE), y = 1:1e6)
# Unit: relative
# expr min lq mean median uq max neval cld
# henrik 31.81237 32.51122 28.79490 30.35766 28.63560 8.236282 100 b
# mt1022 34.71984 35.45341 33.20405 33.57394 31.50914 21.556367 100 c
# docendo 27.99046 28.15855 26.56954 26.60644 25.20044 7.847163 100 b
# frank 1.00000 1.00000 1.00000 1.00000 1.00000 1.000000 100 a
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# henrik 60.655582 76.455531 83.061266 77.632036 78.57818 203.224042 100 c
# mt1022 66.701182 84.133034 87.967300 84.937201 85.72464 201.167914 100 c
# docendo 52.938545 67.214360 71.558130 68.003891 68.51897 184.178346 100 b
# frank 1.977821 2.494039 2.629852 2.663577 2.76089 3.613905 100 a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With