In a data.table
, if a certain column has identical values occurring consecutively over a certain number of times, I'd like to remove the corresponding rows. I also would like to do this by group.
For example, say dt
is my data.table
. I would like to remove rows if the same value occurs consecutively over 2 times in Petal.Width
grouped by Species
.
dt <- iris[c(1:3, 7:7, 51:53, 62:63), ]
setDT(dt)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 2 4.9 3.0 1.4 0.2 setosa
# 3 4.7 3.2 1.3 0.2 setosa
# 7 4.6 3.4 1.4 0.3 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 52 6.4 3.2 4.5 1.5 versicolor
# 53 6.9 3.1 4.9 1.5 versicolor
# 62 5.9 3.0 4.2 1.5 versicolor
# 63 6.0 2.2 4.0 1.0 versicolor
The desired outcome is a data.table with the following rows.
# 7 4.6 3.4 1.4 0.3 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 63 6.0 2.2 4.0 1.0 versicolor
Here is an option:
library(data.table)
setDT(dt)[dt[,{
rl <- rleid(Species, Petal.Width)
rw <- rowid(rl)
.I[!rl %in% rl[rw > 1]]
}]]
output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1: 4.6 3.4 1.4 0.3 setosa
2: 7.0 3.2 4.7 1.4 versicolor
3: 6.0 2.2 4.0 1.0 versicolor
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With