Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill a column with a vector if condition is met

Tags:

r

dplyr

I am trying to solve the following problem. I have a tibble:

> tibble( signal = c(0,1,0,0,1,0,0,1,1,1,1,1,1,0), days =0)
# A tibble: 14 x 2
   signal  days
    <dbl> <dbl>
 1      0     0
 2      1     0
 3      0     0
 4      0     0
 5      1     0
 6      0     0
 7      0     0
 8      1     0
 9      1     0
10      1     0
11      1     0
12      1     0
13      1     0
14      0     0

I need to fill days column the following way:

  • find signal==1 and fill days column with vector 1,2,3,4 once
  • find the next signal==1 after the vector is over and fill days column with vector 1,2,3,4 again

So, the result will look like:

signal  days
    <dbl> <dbl>
 1      0     0
 2      1     1
 3      0     2
 4      0     3
 5      1     4
 6      0     0
 7      0     0
 8      1     1
 9      1     2
10      1     3
11      1     4
12      1     1
13      1     2
14      0     3

I can do it using for loop but having a hard time doing it vectorized preferably using dplyr.

Appreciate any help!

like image 372
kujo Avatar asked Sep 17 '20 13:09

kujo


2 Answers

Here is something basic with data.table::set()

library(data.table)
i <- 1L
n <- nrow(df)
while (i < n) {
  if (df$signal[i] == 1) {
    k <- min(i+3L, n)
    set(df, i = (i:k), j = "days", 1L:(k-i+1L))
    i <- i+4L
  } else {
    i <- i+1L
  }
}

#    signal days
# 1       0    0
# 2       1    1
# 3       0    2
# 4       0    3
# 5       1    4
# 6       0    0
# 7       0    0
# 8       1    1
# 9       1    2
# 10      1    3
# 11      1    4
# 12      1    1
# 13      1    2
# 14      0    3
like image 58
sindri_baldur Avatar answered Sep 16 '22 19:09

sindri_baldur


Here's an Rcpp solution. Although this contains a loop, this has a very low overhead compared to R based loops, and is likely about as quick as you are going to get:

 Rcpp::cppFunction("IntegerVector fill_column(IntegerVector v) {
  bool flag = false;
  int counter = 1;
  for(int i = 0; i < v.length(); ++i) {
    if(flag){
      v[i] = counter++;
      if(counter == 5) { 
        flag = false;
        counter = 1;
      }
    } else {
      if(v[i] == 1) {
        v[i] = counter++;
        flag = true;
      }
    }
  }
  return v;
  }")

This allows you to use the function inside dplyr:

df %>% mutate(days = fill_column(signal))

##>  A tibble: 14 x 2
#>    signal  days
#>     <dbl> <int>
#>  1      0     0
#>  2      1     1
#>  3      0     2
#>  4      0     3
#>  5      1     4
#>  6      0     0
#>  7      0     0
#>  8      1     1
#>  9      1     2
#> 10      1     3
#> 11      1     4
#> 12      1     1
#> 13      1     2
#> 14      0     3
like image 34
Allan Cameron Avatar answered Sep 16 '22 19:09

Allan Cameron