It's best explained with an example. I have a vector, or column from <code>data.frame</code> named <code>vec</code>: <pre class="prettyprint"><code>vec <- c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA) </code></pre> I would like a vectorized process (not a <code>for</code> loop) to change the three trailing <code>NA</code> when a <code>1</code> is observed. The end vector would be: <pre class="prettyprint"><code>c(NA, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, NA, NA, NA) </code></pre> If we had: <pre class="prettyprint"><code>vec <- c(NA, NA, 1, NA, 1, NA, NA, 1, NA, NA, NA, NA, NA, NA) </code></pre> The end vector would look like: <pre class="prettyprint"><code>c(NA, NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, NA) </code></pre> A very badly written solution is: <pre class="prettyprint"><code>vec2 <- vec for(i in index(v)){ if(!is.na(v[i])) vec2[i] <- 1 if(i>3){ if(!is.na(vec[i-1])) vec2[i] <- 1 if(!is.na(vec[i-2])) vec2[i] <- 1 if(!is.na(vec[i-3])) vec2[i] <- 1 } if(i==3){ if(!is.na(vec[i-1])) vec2[i] <- 1 if(!is.na(vec[i-2])) vec2[i] <- 1 } if(i==2){ if(!is.na(vec[i-1])) vec2[i] <- 1 } } </code></pre>

Another option: <pre class="prettyprint"><code>`[<-`(vec,c(outer(which(vec==1),1:3,"+")),1) # [1] NA NA 1 1 1 1 NA 1 1 1 1 NA NA NA </code></pre> Although the above works with the examples, it stretches the length of <code>vec</code> if a 1 is found in the last positions. Better to make a simple check and wrap into a function: <pre class="prettyprint"><code>threeNAs<-function(vec) { ind<-c(outer(which(vec==1),1:3,"+")) ind<-ind[ind<=length(vec)] `[<-`(vec,ind,1) } </code></pre>

Another fast solution: <pre class="prettyprint"><code>vec[rep(which(vec == 1), each = 3) + c(1:3)] <- 1 </code></pre> which gives: <blockquote> <pre class="prettyprint"><code>> vec [1] NA NA 1 1 1 1 NA 1 1 1 1 NA NA NA </code></pre> </blockquote> <hr> Benchmarking is only really useful when done on larger datasets. A benchmark with a 10k larger vector and the several posted solutions: <pre class="prettyprint"><code>library(microbenchmark) microbenchmark(ans.jaap = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); vec[rep(which(vec == 1), each = 3) + c(1:3)] <- 1}, ans.989 = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); r <- which(vec==1); vec[c(mapply(seq, r, r+3))] <- 1}, ans.sotos = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); vec[unique(as.vector(t(sapply(which(vec == 1), function(i) seq(i+1, length.out = 3)))))] <- 1}, ans.gregor = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); vec[is.na(vec)] <- 0; n <- length(vec); vec <- vec + c(0, vec[1:(n-1)]) + c(0, 0, vec[1:(n-2)]) + c(0, 0, 0, vec[1:(n-3)]); vec[vec == 0] <- NA}, ans.moody = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); output <- sapply(1:length(vec),function(i){any(!is.na(vec[max(0,i-3):i]))}); output[output] <- 1; output[output==0] <- NA}, ans.nicola = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); `[<-`(vec,c(outer(which(vec==1),1:3,"+")),1)}) </code></pre> which gives the following benchmark: <blockquote> <pre class="prettyprint"><code>Unit: microseconds expr min lq mean median uq max neval cld ans.jaap 1778.905 1937.414 3064.686 2100.595 2257.695 86233.593 100 a ans.989 87688.166 89638.133 96992.231 90986.269 93326.393 182431.366 100 c ans.sotos 125344.157 127968.113 132386.664 130117.438 132951.380 214460.174 100 d ans.gregor 4036.642 5824.474 10861.373 6533.791 7654.587 87806.955 100 b ans.moody 173146.810 178369.220 183698.670 180318.799 184000.062 264892.878 100 e ans.nicola 966.927 1390.486 1723.395 1604.037 1904.695 3310.203 100 a </code></pre> </blockquote>

What really is 'vectorised', if not a loop written in a C-language? Here's a C++ loop that benchmarks well. <pre class="prettyprint"><code>vec <- c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA) library(Rcpp) cppFunction('NumericVector fixVec(NumericVector myVec){ int n = myVec.size(); int foundCount = 0; for(int i = 0; i < n; i++){ if(myVec[i] == 1) foundCount = 1; if(ISNA(myVec[i])){ if(foundCount >= 1 & foundCount <= 3){ myVec[i] = 1; foundCount++; } } } return myVec; }') fixVec(vec) # [1] NA NA 1 1 1 1 NA 1 1 1 1 NA NA NA </code></pre> <hr> Benchmarks <pre class="prettyprint"><code>library(microbenchmark) microbenchmark( ans.jaap = { vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); vec[rep(which(vec == 1), each = 4) + c(0:3)] <- 1 }, ans.nicola = { vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); `[<-`(vec,c(outer(which(vec==1),0:3,"+")),1) }, ans.symbolix = { vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); vec <- fixVec(vec) } ) # Unit: microseconds # expr min lq mean median uq max neval # ans.jaap 2017.789 2264.318 2905.2437 2579.315 3588.4850 4667.249 100 # ans.nicola 1242.002 1626.704 3839.4768 2095.311 3066.4795 81299.962 100 # ans.symbolix 504.577 533.426 838.5661 718.275 966.9245 2354.373 100 vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4) vec <- fixVec(vec) vec2 <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4) vec2[rep(which(vec2 == 1), each = 4) + c(0:3)] <- 1 identical(vec, vec2) # [1] TRUE </code></pre>

The following code does what you asked for. It involves "shifting" the vector and then adding the shifted versions <pre class="prettyprint"><code>vec[is.na(vec)] <- 0 n <- length(vec) vec <- vec + c(0, vec[1:(n-1)]) + c(0, 0, vec[1:(n-2)]) + c(0, 0, 0, vec[1:(n-3)]) vec[vec == 0] <- NA vec[vec != 0] <- 1 # vec | 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0 ,0, 0 # c(0, vec[1:(n-1)]) | + 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0 ,0, 0 # c(0, 0, vec[1:(n-2)]) | + 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0 ,0 # c(0,0,0,vec[1:(n-3)]) | + 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0 # |------------------------------------------- # | 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0 </code></pre>

Conditionally replace elements of a vector based on an index

Tags:

r

It's best explained with an example.

I have a vector, or column from data.frame named vec:

vec <- c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA)

I would like a vectorized process (not a for loop) to change the three trailing NA when a 1 is observed.

The end vector would be:

c(NA, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, NA, NA, NA)

If we had:

vec <- c(NA, NA, 1, NA, 1, NA, NA, 1, NA, NA, NA, NA, NA, NA)

The end vector would look like:

c(NA, NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, NA, NA, NA)

A very badly written solution is:

vec2 <- vec
for(i in index(v)){
  if(!is.na(v[i])) vec2[i] <- 1
  if(i>3){
    if(!is.na(vec[i-1])) vec2[i] <- 1
    if(!is.na(vec[i-2])) vec2[i] <- 1
    if(!is.na(vec[i-3])) vec2[i] <- 1
  }
  if(i==3){
    if(!is.na(vec[i-1])) vec2[i] <- 1
    if(!is.na(vec[i-2])) vec2[i] <- 1
  }
  if(i==2){
    if(!is.na(vec[i-1])) vec2[i] <- 1
  }
}

321

asked Jul 04 '17 14:07

dimitris_ps

4 Answers

Another option:

`[<-`(vec,c(outer(which(vec==1),1:3,"+")),1)
# [1] NA NA  1  1  1  1 NA  1  1  1  1 NA NA NA

Although the above works with the examples, it stretches the length of vec if a 1 is found in the last positions. Better to make a simple check and wrap into a function:

threeNAs<-function(vec) {
    ind<-c(outer(which(vec==1),1:3,"+"))
    ind<-ind[ind<=length(vec)]
    `[<-`(vec,ind,1)
}

143

answered Oct 05 '22 21:10

nicola

Another fast solution:

vec[rep(which(vec == 1), each = 3) + c(1:3)] <- 1

which gives:

> vec
 [1] NA NA  1  1  1  1 NA  1  1  1  1 NA NA NA

Benchmarking is only really useful when done on larger datasets. A benchmark with a 10k larger vector and the several posted solutions:

library(microbenchmark)

microbenchmark(ans.jaap = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); 
                           vec[rep(which(vec == 1), each = 3) + c(1:3)] <- 1},
               ans.989 = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4);
                          r <- which(vec==1);
                          vec[c(mapply(seq, r, r+3))] <- 1},
               ans.sotos = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4);
                            vec[unique(as.vector(t(sapply(which(vec == 1), function(i) seq(i+1, length.out = 3)))))] <- 1},
               ans.gregor = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4);
                             vec[is.na(vec)] <- 0;
                             n <- length(vec);
                             vec <- vec + c(0, vec[1:(n-1)]) + c(0, 0, vec[1:(n-2)]) + c(0, 0, 0, vec[1:(n-3)]);
                             vec[vec == 0] <- NA},
               ans.moody = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4);
                            output <- sapply(1:length(vec),function(i){any(!is.na(vec[max(0,i-3):i]))});
                            output[output] <- 1;
                            output[output==0] <- NA},
               ans.nicola = {vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4);
                             `[<-`(vec,c(outer(which(vec==1),1:3,"+")),1)})

which gives the following benchmark:

Unit: microseconds
       expr        min         lq       mean     median         uq        max neval   cld
   ans.jaap   1778.905   1937.414   3064.686   2100.595   2257.695  86233.593   100 a    
    ans.989  87688.166  89638.133  96992.231  90986.269  93326.393 182431.366   100   c  
  ans.sotos 125344.157 127968.113 132386.664 130117.438 132951.380 214460.174   100    d 
 ans.gregor   4036.642   5824.474  10861.373   6533.791   7654.587  87806.955   100  b   
  ans.moody 173146.810 178369.220 183698.670 180318.799 184000.062 264892.878   100     e
 ans.nicola    966.927   1390.486   1723.395   1604.037   1904.695   3310.203   100 a

answered Oct 04 '22 21:10

Jaap

What really is 'vectorised', if not a loop written in a C-language?

Here's a C++ loop that benchmarks well.

vec <- c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA)

library(Rcpp)

cppFunction('NumericVector fixVec(NumericVector myVec){

    int n = myVec.size();
    int foundCount = 0;

    for(int i = 0; i < n; i++){
      if(myVec[i] == 1) foundCount = 1; 

      if(ISNA(myVec[i])){
        if(foundCount >= 1 & foundCount <= 3){
          myVec[i] = 1;
          foundCount++;
        }
      }
    }
    return myVec;
    }')

fixVec(vec)
# [1] NA NA  1  1  1  1 NA  1  1  1  1 NA NA NA

Benchmarks

library(microbenchmark)

microbenchmark(
      ans.jaap = {
        vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4); 
      vec[rep(which(vec == 1), each = 4) + c(0:3)] <- 1
},

    ans.nicola = {
        vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4);
      `[<-`(vec,c(outer(which(vec==1),0:3,"+")),1)
        },

    ans.symbolix = {
        vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4);
      vec <- fixVec(vec)
        }
)

# Unit: microseconds
# expr              min       lq      mean   median        uq       max neval
# ans.jaap     2017.789 2264.318 2905.2437 2579.315 3588.4850  4667.249   100
# ans.nicola   1242.002 1626.704 3839.4768 2095.311 3066.4795 81299.962   100
# ans.symbolix  504.577  533.426  838.5661  718.275  966.9245  2354.373   100


vec <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4)
vec <- fixVec(vec)

vec2 <- rep(c(NA, NA, 1, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA),1e4)
vec2[rep(which(vec2 == 1), each = 4) + c(0:3)] <- 1

identical(vec, vec2)
# [1] TRUE

answered Oct 02 '22 21:10

SymbolixAU

The following code does what you asked for. It involves "shifting" the vector and then adding the shifted versions

vec[is.na(vec)] <- 0                                 
n <- length(vec)                                     
vec <- vec + c(0, vec[1:(n-1)]) + c(0, 0, vec[1:(n-2)]) + c(0, 0, 0, vec[1:(n-3)])  
vec[vec == 0] <- NA                                    
vec[vec != 0] <- 1                                     

# vec                    |   0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0 ,0, 0
# c(0, vec[1:(n-1)])     | + 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0 ,0, 0
# c(0, 0, vec[1:(n-2)])  | + 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0 ,0
# c(0,0,0,vec[1:(n-3)])  | + 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0 
#                        |-------------------------------------------
#                        |   0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0

answered Oct 02 '22 21:10

Gregor de Cillia

Related questions
                            
                                using caret package to find optimal parameters of GBM
                            
                                Error with knn function
                            
                                Can't connect to local MySQL server through socket error when using SSH tunel
                            
                                R: Creating Custom Shapes with ggplot
                            
                                Read csv file in R with currency column as numeric
                            
                                Generate sets for cross-validation
                            
                                String continuation across multiple lines, no newline characters
                            
                                How do I take a rolling product using data.table
                            
                                for each group summarise means for all variables in dataframe (ddply? split?)
                            
                                R-forge vs Rforge? [closed]
                            
                                A^k for matrix multiplication in R?
                            
                                download.file() in R has non zero exit status
                            
                                How to fill in the preceding numbers whenever there is a 0 in R?
                            
                                How to omit rows with NA in only two columns in R?
                            
                                Why the built-in lm function is so slow in R?
                            
                                Rotating x label text in ggplot
                            
                                Find the max date in a single column across multiple rows
                            
                                population pyramid density plot in r
                            
                                Cumulatively paste (concatenate) values grouped by another variable
                            
                                Check if value is in data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With