Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Conditional cumulative sum/rollover along a column

Tags:

r

I have a dataset where I'm trying to explore the impact of capping a variable at a given value and rolling the excess into subsequent intervals. I can conceptually see a few ways to do this with cumsum() or similar, but struggling to see how to implement it in a logical way.

The input data isn't huge (10,000s of rows, not into the 100,000s); so efficiency isn't crucial.

Reprex input data:

interval starting kWh
2021-01-01 19:00 12.2
2021-01-01 19:30 14.7
2021-01-01 20:00 20.2
2021-01-01 20:30 30.7
2021-01-01 21:00 36.3
2021-01-01 21:30 36.7
2021-01-01 22:00 30.1
2021-01-01 22:30 26.3
2021-01-01 23:00 18.1
2021-01-01 23:30 15.8
2021-01-02 00:00 11.4
2021-01-02 00:30 10.2
2021-01-02 01:00 11.9
2021-01-02 01:30 12.3
2021-01-02 02:00 9.1
2021-01-02 02:30 8.6
2021-01-02 03:00 8.3
2021-01-02 03:30 10.1

And what I want to do is limit the value in the kWh column to a maximum 20.0; if the value exceeds that I'd like to roll the excess into the next interval, and then the next and so on, until all of the energy has been account for (so the sum across a wide enough interval is always the same), but the peak is never above the limit.

Desired output:

interval starting kWh limit_kWh
2021-01-01 19:00 12.2 12.2
2021-01-01 19:30 14.7 14.7
2021-01-01 20:00 20.2 20.0
2021-01-01 20:30 30.7 20.0
2021-01-01 21:00 36.3 20.0
2021-01-01 21:30 36.7 20.0
2021-01-01 22:00 30.1 20.0
2021-01-01 22:30 26.3 20.0
2021-01-02 23:00 18.1 20.0
2021-01-02 23:30 15.8 20.0
2021-01-02 00:00 11.4 20.0
2021-01-02 00:30 10.2 20.0
2021-01-02 01:00 11.9 20.0
2021-01-02 01:30 12.3 20.0
2021-01-02 02:00 9.1 20.0
2021-01-02 02:30 8.6 17.7
2021-01-02 03:00 8.3 8.3
2021-01-02 03:30 10.1 10.1

So in this the total amount of energy is the same over the time period, but the peak energy is never above the specified limit.

Any help would be very much appreciated!

like image 590
a_leemo Avatar asked Jan 24 '23 06:01

a_leemo


1 Answers

This is just a basic loop that does what you want. It is not particularly efficient, but I couldn't figure out a good way to make it faster using vectorization.

overflow <- 0
for (i in 1:nrow(d)) {
  if (d$kWh[i] + overflow > 20) {
    d$limit_kWh[i] <- 20
    overflow <- d$kWh[i] + overflow - 20
  }
  else {
    d$limit_kWh[i] <- d$kWh[i] + overflow
    overflow <- 0
  }
}

Basically the amount above 20, if any, is stored in the overflow variable, which updates at each entry.


Actually, here is an approximately 2x faster method that relies more on vectorization. It involves creating an overflow vector that contains the amount of overflow from the previous date.

overflow <- numeric(nrow(d))
for (i in 2:nrow(d)) {
  overflow[i] <- max(d$kWh[i-1] + overflow[i-1] - 20, 0)
}
d$limit_kWh <- pmin(d$kWh + overflow, 20)
like image 122
Noah Avatar answered Feb 04 '23 20:02

Noah