I'd like to use Pandas to implement a function that keeps a running balance, but I'm not sure it can be vectorized for speed.
In short, the problem I'm trying to solve is to keep track consumption, generation, and the "bank" of over-generation.
"consumption" means how much is used in a given time period.
"generation" is how much is generated.
When generation is greater than consumption then the homeowner can "bank" the extra generation, to be applied in subsequent time periods. they can apply it if their consumption exceeds their generation for a later month.
This will be for many entities, hence the "id" field. The time sequence is defined by "order"
Very basic example:
Month 1 generates 13 consumes 8 -> therefore banks 5
month 2 generates 8 consumes 10 -> therefore uses 2 from the the bank, and still has 3 left over
Month 3 generates 7 consumes 20 -> exhausts remaining 3 from bank, and has no bank left over.
Code import numpy as np import pandas as pd
id = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2]
order = [1,2,3,4,5,6,7,8,9,18,11,12,13,14,15,1,2,3,4,5,6,7,8,9,10,11]
consume = [10, 17, 20, 11, 17, 19, 20, 10, 10, 19, 14, 12, 10, 14, 13, 19, 12, 17, 12, 18, 15, 14, 15, 20, 16, 15]
generate = [20, 16, 17, 21, 9, 13, 10, 16, 12, 10, 9, 9, 15, 13, 100, 15, 18, 16, 10, 16, 12, 12, 13, 20, 10, 15]
df = pd.DataFrame(list(zip(id, order, consume, generate)),
columns =['id','Order','Consume', 'Generate'])
begin_bal = [0,10,9,6,16,8,2,0,6,8,0,0,0,5,4,0,0,6,5,3,1,0,0,0,0,0]
end_bal = [10,9,6,16,8,2,0,6,8,0,0,0,5,4,91,0,6,5,3,1,0,0,0,0,0,0]
withdraw = [0,1,3,0,8,6,2,0,0,8,0,0,0,1,4,0,0,1,2,2,1,0,0,0,0,0]
df_solution = pd.DataFrame(list(zip(id, order, consume, generate, begin_bal, end_bal, withdraw)),
columns =['id','Order','Consume', 'Generate', 'begin_bal', 'end_bal', 'Withdraw'])
def bank(df):
# deposit all excess when generation exceeds consumption
deposit = (df['Generate'] > df['Consume']) * (df['Generate'] - df['Consume'])
df['end_bal'] = 0
# beginning balance = prior period ending balance
df = df.sort_values(by=['id', 'Order'])
df['begin_bal'] = df['end_bal'].shift(periods=1)
df.loc[df['Order']==1, 'begin_bal'] = 0 # set first month beginning balance of each customer to 0
# calculate withdrawal
df['Withdraw'] = 0
ok_to_withdraw = df['Consume'] > df['Generate']
df.loc[ok_to_withdraw,'Withdraw'] = np.minimum(df.loc[ok_to_withdraw, 'begin_bal'],
df.loc[ok_to_withdraw, 'Consume'] -
df.loc[ok_to_withdraw, 'Generate'] -
deposit[ok_to_withdraw])
# ending balance = beginning balance + deposit - withdraw
df['end_bal'] = df['begin_bal'] + deposit - df['Withdraw']
return df
df = bank(df)
df.head()
id Order Consume Generate end_bal begin_bal Withdraw
0 1 1 10 20 10.0 0.0 0.0
1 1 2 17 16 0.0 0.0 0.0
2 1 3 20 17 0.0 0.0 0.0
3 1 4 11 21 10.0 0.0 0.0
4 1 5 17 9 0.0 0.0 0.0
df_solution.head()
id Order Consume Generate begin_bal end_bal Withdraw
0 1 1 10 20 0 10 0
1 1 2 17 16 10 9 1
2 1 3 20 17 9 6 3
3 1 4 11 21 6 16 0
4 1 5 17 9 16 8 9
I tried to implement with various iterations of cumsum and shift . . . but the fact remains that value of each row seems like it needs to be recalculated based on the prior row, and I'm not sure this is possible to vectorize.
Code to generate some test datasets:
def generate_testdata():
random.seed(42*42)
np.random.seed(42*42)
numids = 10
numorders = 12
id = []
order = []
for i in range(numids):
id = id + [i]*numorders
order = order + list(range(1,numorders+1))
consume = np.random.uniform(low = 10, high = 40, size = numids*numorders)
generate = np.random.uniform(low = 10, high = 40, size = numids*numorders)
df = pd.DataFrame(list(zip(id, order, consume, generate)),
columns =['id','Order','Consume', 'Generate'])
return df
Here is a numpy-ish approach, mostly because I'm not that familiar with pandas:
The idea is to first compute the free cumsum and then to subtract the cumulative minimum if it is negative.
import numpy as np
import pandas as pd
id = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2]
order = [1,2,3,4,5,6,7,8,9,18,11,12,13,14,15,1,2,3,4,5,6,7,8,9,10,11]
consume = [10, 17, 20, 11, 17, 19, 20, 10, 10, 19, 14, 12, 10, 14, 13, 19, 12, 17, 12, 18, 15, 14, 15, 20, 16, 15]
generate = [20, 16, 17, 21, 9, 13, 10, 16, 12, 10, 9, 9, 15, 13, 8, 15, 18, 16, 10, 16, 12, 12, 13, 20, 10, 15]
df = pd.DataFrame(list(zip(id, order, consume, generate)),
columns =['id','Order','Consume', 'Generate'])
begin_bal = [0,10,9,6,16,8,2,0,6,8,0,0,0,5,4,0,0,6,5,3,1,0,0,0,0,0]
end_bal = [10,9,6,16,8,2,0,6,8,0,0,0,5,4,0,0,6,5,3,1,0,0,0,0,0,0]
withdraw = [0,1,3,0,9,6,2,0,0,8,0,0,0,1,4,0,0,1,2,2,1,0,0,0,0,0]
df_solution = pd.DataFrame(list(zip(id, order, consume, generate, begin_bal, end_bal, withdraw)),
columns =['id','Order','Consume', 'Generate', 'begin_bal', 'end_bal', 'Withdraw'])
def f(df):
# find block bondaries
ids = df["id"].values
bnds, = np.where(np.diff(ids, prepend=ids[0]-1, append=ids[-1]+1))
# find raw balance change
delta = (df["Generate"] - df["Consume"]).values
# find offset, so cumulative min does not interfere across ids
safe_total = (np.minimum(delta.min(), 0)-1) * np.diff(bnds[:-1])
# must apply offset just before group switch, so it aligns the first
# begin_bal, not end_bal, of the next group
# also keep a copy of original values at switches
delta_orig = delta[bnds[1:-1]-1]
delta[bnds[1:-1]-1] += safe_total - np.add.reduceat(delta, bnds[:-2])
# form free cumsum
acc = delta.cumsum()
# correct
acc -= np.minimum(0, np.minimum.accumulate(acc))
# write solution back to df
shft = np.empty_like(acc)
shft[1:] = acc[:-1]
shft[0] = 0
# reinstate last end_bal of each group
acc[bnds[1:-1]-1] = np.maximum(0, shft[bnds[1:-1]-1] + delta_orig)
df["begin_bal"] = shft
df["end_bal"] = acc
df["Withdraw"] = np.maximum(0, df["begin_bal"] - df["end_bal"])
Test:
f(df)
df == df_solution
Prints:
id Order Consume Generate begin_bal end_bal Withdraw
0 True True True True True True True
1 True True True True True True True
2 True True True True True True True
3 True True True True True True True
4 True True True True True True False
5 True True True True True True True
6 True True True True True True True
7 True True True True True True True
8 True True True True True True True
9 True True True True True True True
10 True True True True True True True
11 True True True True True True True
12 True True True True True True True
13 True True True True True True True
14 True True True True True True True
15 True True True True True True True
16 True True True True True True True
17 True True True True True True True
18 True True True True True True True
19 True True True True True True True
20 True True True True True True True
21 True True True True True True True
22 True True True True True True True
23 True True True True True True True
24 True True True True True True True
25 True True True True True True True
There is one False but that appears to be a typo in the expected output provided.
Using @PaulPanzer's logic here is a pandas version.
def CalcEB(x):
delta = x['Generate'] - x['Consume']
return delta.cumsum() - delta.cumsum().cummin().clip(-np.inf,0)
df['end_bal'] = df.groupby('id', as_index=False).apply(CalcEB).values
df['begin_bal'] = df.groupby('id')['end_bal'].shift().fillna(0)
df['Withdraw'] = (df['begin_bal'] - df['end_bal']).clip(0,np.inf)
df_pandas = df.copy()
#Note the typo mentioned by Paul Panzer
df_pandas.reindex(df_solution.columns, axis=1) == df_solution
Output (check dataframes)
id Order Consume Generate begin_bal end_bal Withdraw
0 True True True True True True True
1 True True True True True True True
2 True True True True True True True
3 True True True True True True True
4 True True True True True True False
5 True True True True True True True
6 True True True True True True True
7 True True True True True True True
8 True True True True True True True
9 True True True True True True True
10 True True True True True True True
11 True True True True True True True
12 True True True True True True True
13 True True True True True True True
14 True True True True True True True
15 True True True True True True True
16 True True True True True True True
17 True True True True True True True
18 True True True True True True True
19 True True True True True True True
20 True True True True True True True
21 True True True True True True True
22 True True True True True True True
23 True True True True True True True
24 True True True True True True True
25 True True True True True True True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With