Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional Cumulative Sums in Pandas

I am a former Excel power user repenting for his sins. I need help recreating a common calculation for me.

I am trying to calculate the performance of a loan portfolio. In the numerator, I am calculating the cumulative total of losses. In the denominator, I need the original balance of the loans included in the cumulative total.

I cannot figure out how to do a conditional groupby in Pandas to accomplish this. It is very simple in Excel, so I am hoping that I am overthinking it.

I could not find much on the issue on StackOverflow, but this was the closest: python pandas conditional cumulative sum

The thing I cannot figure out is that my conditions are based on values in the index and contained in columns

Below is my data:

| Loan    | Origination | Balance | NCO Date  | NCO | As of Date | Age     (Months) | NCO Age (Months) |
|---------|-------------|---------|-----------|-----|------------|--------------|------------------|
| Loan 1  | 1/31/2011   | 1000    | 1/31/2018 | 25  | 5/31/2019  | 100              | 84               |
| Loan 2  | 3/31/2011   | 2500    |           | 0   | 5/31/2019  | 98           |                  |
| Loan 3  | 5/31/2011   | 3000    | 1/31/2019 | 15  | 5/31/2019  | 96           | 92               |
| Loan 4  | 7/31/2011   | 2500    |           | 0   | 5/31/2019  | 94           |                  |
| Loan 5  | 9/30/2011   | 1500    | 3/31/2019 | 35  | 5/31/2019  | 92           | 90               |
| Loan 6  | 11/30/2011  | 2500    |           | 0   | 5/31/2019  | 90           |                  |
| Loan 7  | 1/31/2012   | 1000    | 5/31/2019 | 5   | 5/31/2019  | 88           | 88               |
| Loan 8  | 3/31/2012   | 2500    |           | 0   | 5/31/2019  | 86           |                  |
| Loan 9  | 5/31/2012   | 1000    |           | 0   | 5/31/2019  | 84           |                  |
| Loan 10 | 7/31/2012   | 1250    |           | 0   | 5/31/2019  | 82           |                  |

In Excel, I would calculate this total using the following formulas:

Outstanding Balance Line: =SUMIFS(Balance,Age (Months),Reference Age)

Cumulative NCO: =SUMIFS(NCO,Age (Months),>=Reference Age,NCO Age (Months),<=&Reference Age)

Data:

| Reference Age       | 85    | 90    | 95   | 100  
|---------------------|-------|-------|------|------
| Outstanding Balance | 16500 | 13000 | 6500 | 1000 
| Cumulative NCO      | 25    | 60    | 40   | 25   

The goal here is to include things in Outstanding Balance that are old enough to have an observation for NCO. And NCOs are the total amount that have occurred up until that point for those loans outstanding.

EDIT:

I have gotten a calculation this way. But is this the most efficient?

age_bins = list(np.arange(85, 101, 5))
final_df = pd.DataFrame()
df.fillna(value=0, inplace=True)
df["NCO Age (Months)"] = df["NCO Age (Months)"].astype(int)

for x in age_bins:

    age = x

    nco = df.loc[(df["Age (Months)"] >= x) & (df["NCO Age (Months)"] <= x), "NCO"].sum()

    bal = df.loc[(df["Age (Months)"] >= x), "Balance"].sum()

    temp_df = pd.DataFrame(
        data=[[age, nco, bal]],
        columns=["Age", "Cumulative NCO", "Outstanding Balance"],
        index=[age],
    )

    final_df = final_df.append(temp_df, sort=True)
like image 595
Russ W Avatar asked Jun 14 '19 16:06

Russ W


1 Answers

You use a complex conditions depending on variables. It is easy to find a vectorized way for simple cumulative sums, but I cannot imagine a nice way for the Cumulative NCO.

So I would revert to Python comprehensions:

data = [
    { 'Reference Age': ref,
      'Outstanding Balance': df.loc[df.iloc[:,6]>=ref,'Balance'].sum(),
      'Cumulative NCO': df.loc[(df.iloc[:,6]>=ref)&(df.iloc[:,7]<=ref),
                   'NCO'].sum() }
    for ref in [85, 90, 95, 100]]

result = pd.DataFrame(data).set_index('Reference Age').T

It produces:

Reference Age          85     90    95    100
Cumulative NCO          25     60    40    25
Outstanding Balance  16500  13000  6500  1000
like image 135
Serge Ballesta Avatar answered Oct 19 '22 20:10

Serge Ballesta