Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative sum until maximum reached, then repeat from zero in the next row

I feel like this is a fairly easy question, but for the life of me I can't seem to find the answer. I have a fairly standard dataframe, and what I am trying to do is sum the a column of values until they reach some value (either that exact value or greater than it), at which point it drops a 1 into a new column (labelled keep) and restarts the summing at 0.

I have a column of minutes, the differences between the minutes, a keep column, and a cumulative sum column (the example I am using is much cleaner than the actual full dataset)

 minutes     difference     keep     difference_sum
 1052991158       0          0            0
 1052991338      180         0            180
 1052991518      180         0            360
 1052991698      180         0            540
 1052991878      180         0            720
 1052992058      180         0            900
 1052992238      180         0            1080
 1052992418      180         0            1260
 1052992598      180         0            1440
 1052992778      180         0            1620
 1052992958      180         0            1800

The difference sum column was calculated with the code

caribou.sub$difference_sum<-cumsum(difference)

What I would like to do is run the above code with the condition that, when the summed value reaches either 1470 or any number greater than that it puts a 1 in the keep column and then restarts summing afterwards, and continues running throughout the dataset.

Thanks in advance, and if you need any more information let me know.

Ayden

like image 318
HeidelbergSlide Avatar asked Mar 17 '13 22:03

HeidelbergSlide


2 Answers

I still don't understand about when the sum should restart and if it should be zero then. A desired result would help greatly.

Nonetheless, I can't help but think that simply indexing and subtraction would be a straightforward way of doing this. The below code gives the same result as @Henrik's solution.

df$difference_sum <- cumsum(df$difference)
step <- (df$difference_sum %/% 1470) + 1
k <- which(diff(step) > 0) + 1
df$keep <- 0
df$keep[k] <- 1
step[k] <- step[k] - 1
df$difference_sum <- df$difference_sum - c(0, df$difference_sum[k])[step]
like image 186
Aaron left Stack Overflow Avatar answered Oct 11 '22 13:10

Aaron left Stack Overflow


I think this is best done with a for loop, can't think of a function that could do so out of the box. The following should do what you want (if I understand you correctly).

current.sum <- 0
for (c in 1:nrow(caribou.sub)) {
    current.sum <- current.sum + caribou.sub[c, "difference"]
    carribou.sub[c, "difference_sum"] <- current.sum
    if (current.sum >= 1470) {
        caribou.sub[c, "keep"] <- 1
        current.sum <- 0
    }
}

Feel free to comment if it does not exactly what you want. But as pointed out by alexwhan, your description is not completely clear.

like image 32
Henrik Avatar answered Oct 11 '22 14:10

Henrik