R: Conditional cumulative sum/rollover along a column

Question

I have a dataset where I'm trying to explore the impact of capping a variable at a given value and rolling the excess into subsequent intervals. I can conceptually see a few ways to do this with cumsum() or similar, but struggling to see how to implement it in a logical way.

The input data isn't huge (10,000s of rows, not into the 100,000s); so efficiency isn't crucial.

Reprex input data:

interval starting	kWh
2021-01-01 19:00	12.2
2021-01-01 19:30	14.7
2021-01-01 20:00	20.2
2021-01-01 20:30	30.7
2021-01-01 21:00	36.3
2021-01-01 21:30	36.7
2021-01-01 22:00	30.1
2021-01-01 22:30	26.3
2021-01-01 23:00	18.1
2021-01-01 23:30	15.8
2021-01-02 00:00	11.4
2021-01-02 00:30	10.2
2021-01-02 01:00	11.9
2021-01-02 01:30	12.3
2021-01-02 02:00	9.1
2021-01-02 02:30	8.6
2021-01-02 03:00	8.3
2021-01-02 03:30	10.1

And what I want to do is limit the value in the kWh column to a maximum 20.0; if the value exceeds that I'd like to roll the excess into the next interval, and then the next and so on, until all of the energy has been account for (so the sum across a wide enough interval is always the same), but the peak is never above the limit.

Desired output:

interval starting	kWh	limit_kWh
2021-01-01 19:00	12.2	12.2
2021-01-01 19:30	14.7	14.7
2021-01-01 20:00	20.2	20.0
2021-01-01 20:30	30.7	20.0
2021-01-01 21:00	36.3	20.0
2021-01-01 21:30	36.7	20.0
2021-01-01 22:00	30.1	20.0
2021-01-01 22:30	26.3	20.0
2021-01-02 23:00	18.1	20.0
2021-01-02 23:30	15.8	20.0
2021-01-02 00:00	11.4	20.0
2021-01-02 00:30	10.2	20.0
2021-01-02 01:00	11.9	20.0
2021-01-02 01:30	12.3	20.0
2021-01-02 02:00	9.1	20.0
2021-01-02 02:30	8.6	17.7
2021-01-02 03:00	8.3	8.3
2021-01-02 03:30	10.1	10.1

So in this the total amount of energy is the same over the time period, but the peak energy is never above the specified limit.

Any help would be very much appreciated!

Noah · Accepted Answer

This is just a basic loop that does what you want. It is not particularly efficient, but I couldn't figure out a good way to make it faster using vectorization.

overflow <- 0
for (i in 1:nrow(d)) {
  if (d$kWh[i] + overflow > 20) {
    d$limit_kWh[i] <- 20
    overflow <- d$kWh[i] + overflow - 20
  }
  else {
    d$limit_kWh[i] <- d$kWh[i] + overflow
    overflow <- 0
  }
}

Basically the amount above 20, if any, is stored in the overflow variable, which updates at each entry.

Actually, here is an approximately 2x faster method that relies more on vectorization. It involves creating an overflow vector that contains the amount of overflow from the previous date.

overflow <- numeric(nrow(d))
for (i in 2:nrow(d)) {
  overflow[i] <- max(d$kWh[i-1] + overflow[i-1] - 20, 0)
}
d$limit_kWh <- pmin(d$kWh + overflow, 20)

R: Conditional cumulative sum/rollover along a column

Tags:

r

a_leemo

1 Answers

Noah

Recent Activity

Donate For Us

R: Conditional cumulative sum/rollover along a column

Tags:

r

a_leemo

1 Answers

Noah

Related questions

Recent Activity

Donate For Us