I have recently looked for advice on how to suppress all but the first occurrences of a value within a group using dplyr (dplyr override all but the first occurrences of a value within a group).
The solution was a really clever one and now I am struggling with finding something equally efficient in case I need to suppress only n next values.
For example, in the code below I create a new "tag" column:
library('dplyr')
data(iris)
set.seed(1)
iris$tag <- sample(c(0,1), 150, replace=TRUE, prob = c(0.7, 0.3))
giris <- iris %>% group_by(Species)
# Source: local data frame [150 x 6]
# Groups: Species [3]
#
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species tag
# (dbl) (dbl) (dbl) (dbl) (fctr) (dbl)
# 1 5.1 3.5 1.4 0.2 setosa 0
# 2 4.9 3.0 1.4 0.2 setosa 0
# 3 4.7 3.2 1.3 0.2 setosa 0
# 4 4.6 3.1 1.5 0.2 setosa 1
# 5 5.0 3.6 1.4 0.2 setosa 0
# 6 5.4 3.9 1.7 0.4 setosa 1
# 7 4.6 3.4 1.4 0.3 setosa 1
# 8 5.0 3.4 1.5 0.2 setosa 0
# 9 4.4 2.9 1.4 0.2 setosa 0
# 10 4.9 3.1 1.5 0.1 setosa 0
# .. ... ... ... ... ... ...
In the setosa group rows: 4, 6, 7, ... are tagged as "1"s. I am trying to suppress "1"s (i.e. convert them to "0"s) in the next two rows after any occurrence of a "1". In other words, rows #5 and #6 should be set to "0" but #7 should remain unaffected. In this case, row #7 happens to be a "1", so rows #8 and #9 should be set to "0"s and so on...
Any hint on how to do this in dplyr? This package is really powerful but for a reason it is a mental challenge for me to master all the subtleties...
Some more examples: in case of: 0 0 1 1, the output should be 0 0 1 0 in case of: 0 0 1 1 1 1 1, the output should be 0 0 1 0 0 1 0
I can't think of any better way to do this than a loop:
flip_followers = function(tag, nf = 2L){
w = which(tag==1L)
keep = rep(TRUE, length(w))
for (i in seq_along(w)) if (keep[i]) keep[match(w[i]+seq_len(nf), w)] = FALSE
tag[w[!keep]] = 0L
tag
}
giris %>% mutate(tag = flip_followers(tag))
Source: local data frame [150 x 6]
Groups: Species [3]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species tag
(dbl) (dbl) (dbl) (dbl) (fctr) (dbl)
1 5.1 3.5 1.4 0.2 setosa 0
2 4.9 3.0 1.4 0.2 setosa 0
3 4.7 3.2 1.3 0.2 setosa 0
4 4.6 3.1 1.5 0.2 setosa 1
5 5.0 3.6 1.4 0.2 setosa 0
6 5.4 3.9 1.7 0.4 setosa 0
7 4.6 3.4 1.4 0.3 setosa 1
8 5.0 3.4 1.5 0.2 setosa 0
9 4.4 2.9 1.4 0.2 setosa 0
10 4.9 3.1 1.5 0.1 setosa 0
.. ... ... ... ... ... ...
For a possible speedup, you could switch the loop to if (keep[i]) keep[i+seq_len(nf)][match(w[i]+seq_len(nf), w[i+seq_len(nf)])] = FALSE
so that match
only searches the next nf
elements of w
. I'm sure Rcpp would be faster still, if that's a serious concern.
To me this is semantically clearer if you use an accumulating reduce to keep track of the refraction period.
suppress <- function(x, w) {
r <- Reduce(function(d,i) if(i&!d) w else max(0,d-1), x, init=0, acc=TRUE)[-1]
x * (r==w)
}
Example
suppress(c(0,0,1,1,1,1,1), 2)
#> [1] 0 0 1 0 0 1 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With