I am a former Excel power user repenting for his sins. I need help recreating a common calculation for me.
I am trying to calculate the performance of a loan portfolio. In the numerator, I am calculating the cumulative total of losses. In the denominator, I need the original balance of the loans included in the cumulative total.
I cannot figure out how to do a conditional groupby in Pandas to accomplish this. It is very simple in Excel, so I am hoping that I am overthinking it.
I could not find much on the issue on StackOverflow, but this was the closest: python pandas conditional cumulative sum
The thing I cannot figure out is that my conditions are based on values in the index and contained in columns
Below is my data:
| Loan | Origination | Balance | NCO Date | NCO | As of Date | Age (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1 | 1/31/2011 | 1000 | 1/31/2018 | 25 | 5/31/2019 | 100 | 84 |
| Loan 2 | 3/31/2011 | 2500 | | 0 | 5/31/2019 | 98 | |
| Loan 3 | 5/31/2011 | 3000 | 1/31/2019 | 15 | 5/31/2019 | 96 | 92 |
| Loan 4 | 7/31/2011 | 2500 | | 0 | 5/31/2019 | 94 | |
| Loan 5 | 9/30/2011 | 1500 | 3/31/2019 | 35 | 5/31/2019 | 92 | 90 |
| Loan 6 | 11/30/2011 | 2500 | | 0 | 5/31/2019 | 90 | |
| Loan 7 | 1/31/2012 | 1000 | 5/31/2019 | 5 | 5/31/2019 | 88 | 88 |
| Loan 8 | 3/31/2012 | 2500 | | 0 | 5/31/2019 | 86 | |
| Loan 9 | 5/31/2012 | 1000 | | 0 | 5/31/2019 | 84 | |
| Loan 10 | 7/31/2012 | 1250 | | 0 | 5/31/2019 | 82 | |
In Excel, I would calculate this total using the following formulas:
Outstanding Balance Line: =SUMIFS(Balance,Age (Months),Reference Age)
Cumulative NCO: =SUMIFS(NCO,Age (Months),>=Reference Age,NCO Age (Months),<=&Reference Age)
Data:
| Reference Age | 85 | 90 | 95 | 100
|---------------------|-------|-------|------|------
| Outstanding Balance | 16500 | 13000 | 6500 | 1000
| Cumulative NCO | 25 | 60 | 40 | 25
The goal here is to include things in Outstanding Balance that are old enough to have an observation for NCO. And NCOs are the total amount that have occurred up until that point for those loans outstanding.
EDIT:
I have gotten a calculation this way. But is this the most efficient?
age_bins = list(np.arange(85, 101, 5))
final_df = pd.DataFrame()
df.fillna(value=0, inplace=True)
df["NCO Age (Months)"] = df["NCO Age (Months)"].astype(int)
for x in age_bins:
age = x
nco = df.loc[(df["Age (Months)"] >= x) & (df["NCO Age (Months)"] <= x), "NCO"].sum()
bal = df.loc[(df["Age (Months)"] >= x), "Balance"].sum()
temp_df = pd.DataFrame(
data=[[age, nco, bal]],
columns=["Age", "Cumulative NCO", "Outstanding Balance"],
index=[age],
)
final_df = final_df.append(temp_df, sort=True)
You use a complex conditions depending on variables. It is easy to find a vectorized way for simple cumulative sums, but I cannot imagine a nice way for the Cumulative NCO.
So I would revert to Python comprehensions:
data = [
{ 'Reference Age': ref,
'Outstanding Balance': df.loc[df.iloc[:,6]>=ref,'Balance'].sum(),
'Cumulative NCO': df.loc[(df.iloc[:,6]>=ref)&(df.iloc[:,7]<=ref),
'NCO'].sum() }
for ref in [85, 90, 95, 100]]
result = pd.DataFrame(data).set_index('Reference Age').T
It produces:
Reference Age 85 90 95 100
Cumulative NCO 25 60 40 25
Outstanding Balance 16500 13000 6500 1000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With