Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Conditional summation of a numeric vector

Tags:

loops

r

I have vectors that have numeric values. For example:

inVector <- c(2, -10, 5, 34, 7)

I need to transform this so that when I encounter a negative element, that negative element gets summed with subsequent elements until the element that turns the sum positive:

outVector <- c(2, 0, 0, 29, 7)

The negative elements will be made zeros so that the overall sum remains. So the elements 2 and 3 will be zero and the fourth element equals 29 = -10 + 5 + 34. I tried a for loop solution like this:

outVector <- numeric(length = length(inVector))

for(i in 1:length(inVector)) {
   outVector <- inVector
   outVector[i] <- ifelse(outVector[i] < 0, 0, outVector[i])
   outVector[i + 1] <- ifelse(outVector[i] == 0, sum(inVector[i:(i+1)]), outVector[i + 1])
   outVector <- outVector[1:length(inVector)]
   }

but that didn't work. However, I would be most interested of a solution that works in dplyr pipe as well.

like image 856
Antti Avatar asked Aug 23 '16 13:08

Antti


People also ask

How do I sum a numeric vector in R?

The sum() is a built-in R function that calculates the sum of a numeric input vector. It accepts a numeric vector as an argument and returns the sum of the vector elements. To calculate the sum of vectors in R, use the sum() function.

Is sum () a vectorized function?

No. For example, if you write the statement s=sum(v), you are calling a function and that is not vectorized code. The function sum may or may not use vectorized code to do the summing, but the function call that you write is just that, a function call—it does not perform an operation on multiple components.

How do I add elements to a vector in R?

Adding elements in a vector in R programming – append() method. append() method in R programming is used to append the different types of integer values into a vector in the last. Return: Returns the new vector after appending given value.


2 Answers

If we want to optimize, we can use the more efficient Reduce function to iterate through the vector:

#Help function
zeroElement <- function(vec) {
  r <- Reduce(function(x,y) if(x >= 0) y else sum(x,y), vec, acc=TRUE)
  r[r < 0] <- 0
  return(r)
}

#Use function
zeroElement(x)
#[1]  2  0  0 29  7

Speed Test: 25% faster:

t3 <- MakeNonNeg(BigVec)
t4 <- zeroElement(BigVec)
all.equal(t3, t4)
#[1] TRUE
library(microbenchmark)
microbenchmark(
  makeNonNeg = MakeNonNeg(BigVec),
  zeroElement = zeroElement(BigVec),
  times=10)
# Unit: seconds
#        expr      min       lq     mean   median       uq      max neval cld
#  makeNonNeg 2.047484 2.099289 2.195988 2.111135 2.248381 2.531009    10   b
# zeroElement 1.529257 1.580789 1.666000 1.664855 1.725528 1.837825    10  a

Add session info for comparison:

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
like image 83
Pierre L Avatar answered Oct 06 '22 20:10

Pierre L


Try this:

MakeNonNeg <- function(v) {
    size <- length(v)
    myOut <- as.numeric(v)
    if (size > 1L) {
        for (i in 1:(size-1L)) {
            if (myOut[i] >= 0) {next}
            myOut[i+1L] <- myOut[i]+myOut[i+1L]
            myOut[i] <- 0
        }
    }
    myOut
}

MakeNonNeg(inVector)
[1]  2  0  0 29  7

Below is a more exotic example:

set.seed(4242)

BigVec <- sample(-40000:100000, 100000, replace = TRUE)
gmp::sum.bigz(BigVec)
Big Integer ('bigz') :
    [1] 2997861106

t3 <- MakeNonNeg(BigVec)
gmp::sum.bigz(t3)
Big Integer ('bigz') :
    [1] 2997861106

BigVec[1:20]
[1]  98056   8680  -7814  53620  58390  90832  74970 -16392  52648  83779 -17229  38484 -36589  75156  71200  95968 -11599  57705
[19]  19209 -21596

t3[1:20]
[1] 98056  8680     0 45806 58390 90832 74970     0 36256 83779     0 21255     0 38567 71200 95968     0 46106 19209     0

Here is my system info:

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Below are timings for both functions with JIT disabled.

microbenchmark(
    makeNonNeg = MakeNonNeg(BigVec),
    zeroElement = zeroElement(BigVec),
    times=10)
Unit: milliseconds
       expr      min       lq     mean   median       uq      max neval
 makeNonNeg 254.1255 255.8430 267.9527 258.6369 277.0222 303.6516    10
zeroElement 152.0358 164.7988 175.3191 166.4948 198.3855 209.8739    10

With JIT enabled, we obtain much different results for makeNonNeg. However, the results for zeroElement don't change that much (I'm thinking that since Reduce is the major part of the function and it is already bytecoded, there is not much room for improvement).

library(compiler)
enableJIT(3)
[1] 0

microbenchmark(
    makeNonNeg = MakeNonNeg(BigVec),
    zeroElement = zeroElement(BigVec),
    times=10)
Unit: milliseconds
       expr       min        lq      mean    median        uq       max neval
 makeNonNeg  11.20514  11.55366  12.76953  11.84655  12.20554  20.60036    10
zeroElement 144.15123 149.33591 163.66421 157.34711 176.20139 198.57268    10

So, with JIT disabled, zeroElement is about 50% faster and when JIT is enabled, MakeNonNeg is about 13x faster.

like image 42
Joseph Wood Avatar answered Oct 06 '22 18:10

Joseph Wood