I have a dataset where I'm trying to explore the impact of capping a variable at a given value and rolling the excess into subsequent intervals. I can conceptually see a few ways to do this with cumsum()
or similar, but struggling to see how to implement it in a logical way.
The input data isn't huge (10,000s of rows, not into the 100,000s); so efficiency isn't crucial.
Reprex input data:
interval starting | kWh |
---|---|
2021-01-01 19:00 | 12.2 |
2021-01-01 19:30 | 14.7 |
2021-01-01 20:00 | 20.2 |
2021-01-01 20:30 | 30.7 |
2021-01-01 21:00 | 36.3 |
2021-01-01 21:30 | 36.7 |
2021-01-01 22:00 | 30.1 |
2021-01-01 22:30 | 26.3 |
2021-01-01 23:00 | 18.1 |
2021-01-01 23:30 | 15.8 |
2021-01-02 00:00 | 11.4 |
2021-01-02 00:30 | 10.2 |
2021-01-02 01:00 | 11.9 |
2021-01-02 01:30 | 12.3 |
2021-01-02 02:00 | 9.1 |
2021-01-02 02:30 | 8.6 |
2021-01-02 03:00 | 8.3 |
2021-01-02 03:30 | 10.1 |
And what I want to do is limit the value in the kWh column to a maximum 20.0; if the value exceeds that I'd like to roll the excess into the next interval, and then the next and so on, until all of the energy has been account for (so the sum across a wide enough interval is always the same), but the peak is never above the limit.
Desired output:
interval starting | kWh | limit_kWh |
---|---|---|
2021-01-01 19:00 | 12.2 | 12.2 |
2021-01-01 19:30 | 14.7 | 14.7 |
2021-01-01 20:00 | 20.2 | 20.0 |
2021-01-01 20:30 | 30.7 | 20.0 |
2021-01-01 21:00 | 36.3 | 20.0 |
2021-01-01 21:30 | 36.7 | 20.0 |
2021-01-01 22:00 | 30.1 | 20.0 |
2021-01-01 22:30 | 26.3 | 20.0 |
2021-01-02 23:00 | 18.1 | 20.0 |
2021-01-02 23:30 | 15.8 | 20.0 |
2021-01-02 00:00 | 11.4 | 20.0 |
2021-01-02 00:30 | 10.2 | 20.0 |
2021-01-02 01:00 | 11.9 | 20.0 |
2021-01-02 01:30 | 12.3 | 20.0 |
2021-01-02 02:00 | 9.1 | 20.0 |
2021-01-02 02:30 | 8.6 | 17.7 |
2021-01-02 03:00 | 8.3 | 8.3 |
2021-01-02 03:30 | 10.1 | 10.1 |
So in this the total amount of energy is the same over the time period, but the peak energy is never above the specified limit.
Any help would be very much appreciated!
This is just a basic loop that does what you want. It is not particularly efficient, but I couldn't figure out a good way to make it faster using vectorization.
overflow <- 0
for (i in 1:nrow(d)) {
if (d$kWh[i] + overflow > 20) {
d$limit_kWh[i] <- 20
overflow <- d$kWh[i] + overflow - 20
}
else {
d$limit_kWh[i] <- d$kWh[i] + overflow
overflow <- 0
}
}
Basically the amount above 20, if any, is stored in the overflow
variable, which updates at each entry.
Actually, here is an approximately 2x faster method that relies more on vectorization. It involves creating an overflow
vector that contains the amount of overflow from the previous date.
overflow <- numeric(nrow(d))
for (i in 2:nrow(d)) {
overflow[i] <- max(d$kWh[i-1] + overflow[i-1] - 20, 0)
}
d$limit_kWh <- pmin(d$kWh + overflow, 20)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With