Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr suppress next n occurrences of a value in a group

Tags:

r

dplyr

I have recently looked for advice on how to suppress all but the first occurrences of a value within a group using dplyr (dplyr override all but the first occurrences of a value within a group).

The solution was a really clever one and now I am struggling with finding something equally efficient in case I need to suppress only n next values.

For example, in the code below I create a new "tag" column:

library('dplyr')
data(iris)
set.seed(1)
iris$tag <- sample(c(0,1), 150, replace=TRUE, prob = c(0.7, 0.3))
giris <- iris %>% group_by(Species)

# Source: local data frame [150 x 6]
# Groups: Species [3]
# 
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   tag
#           (dbl)       (dbl)        (dbl)       (dbl)  (fctr) (dbl)
# 1           5.1         3.5          1.4         0.2  setosa     0
# 2           4.9         3.0          1.4         0.2  setosa     0
# 3           4.7         3.2          1.3         0.2  setosa     0
# 4           4.6         3.1          1.5         0.2  setosa     1
# 5           5.0         3.6          1.4         0.2  setosa     0
# 6           5.4         3.9          1.7         0.4  setosa     1
# 7           4.6         3.4          1.4         0.3  setosa     1
# 8           5.0         3.4          1.5         0.2  setosa     0
# 9           4.4         2.9          1.4         0.2  setosa     0
# 10          4.9         3.1          1.5         0.1  setosa     0
# ..          ...         ...          ...         ...     ...   ...

In the setosa group rows: 4, 6, 7, ... are tagged as "1"s. I am trying to suppress "1"s (i.e. convert them to "0"s) in the next two rows after any occurrence of a "1". In other words, rows #5 and #6 should be set to "0" but #7 should remain unaffected. In this case, row #7 happens to be a "1", so rows #8 and #9 should be set to "0"s and so on...

Any hint on how to do this in dplyr? This package is really powerful but for a reason it is a mental challenge for me to master all the subtleties...


Some more examples: in case of: 0 0 1 1, the output should be 0 0 1 0 in case of: 0 0 1 1 1 1 1, the output should be 0 0 1 0 0 1 0

like image 887
rpl Avatar asked Mar 18 '16 15:03

rpl


Video Answer


2 Answers

I can't think of any better way to do this than a loop:

flip_followers = function(tag, nf = 2L){
    w    = which(tag==1L)
    keep = rep(TRUE, length(w))
    for (i in seq_along(w)) if (keep[i]) keep[match(w[i]+seq_len(nf), w)] = FALSE
    tag[w[!keep]] = 0L
    tag
}

giris %>% mutate(tag = flip_followers(tag))



Source: local data frame [150 x 6]
Groups: Species [3]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   tag
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr) (dbl)
1           5.1         3.5          1.4         0.2  setosa     0
2           4.9         3.0          1.4         0.2  setosa     0
3           4.7         3.2          1.3         0.2  setosa     0
4           4.6         3.1          1.5         0.2  setosa     1
5           5.0         3.6          1.4         0.2  setosa     0
6           5.4         3.9          1.7         0.4  setosa     0
7           4.6         3.4          1.4         0.3  setosa     1
8           5.0         3.4          1.5         0.2  setosa     0
9           4.4         2.9          1.4         0.2  setosa     0
10          4.9         3.1          1.5         0.1  setosa     0
..          ...         ...          ...         ...     ...   ...

For a possible speedup, you could switch the loop to if (keep[i]) keep[i+seq_len(nf)][match(w[i]+seq_len(nf), w[i+seq_len(nf)])] = FALSE so that match only searches the next nf elements of w. I'm sure Rcpp would be faster still, if that's a serious concern.

like image 76
Frank Avatar answered Sep 25 '22 04:09

Frank


To me this is semantically clearer if you use an accumulating reduce to keep track of the refraction period.

suppress <- function(x, w) {
  r <- Reduce(function(d,i) if(i&!d) w else max(0,d-1), x, init=0, acc=TRUE)[-1] 
  x * (r==w)
}

Example

suppress(c(0,0,1,1,1,1,1), 2)
#>     [1] 0 0 1 0 0 1 0
like image 27
A. Webb Avatar answered Sep 24 '22 04:09

A. Webb