Apply custom cumulative function to pandas dataframe

Tags:

pandas

I have a dataframe sorted by date:

df = pd.DataFrame({'idx': [1, 1, 1, 2, 2, 2],
                   'date': ['2016-04-30', '2016-05-31', '2016-06-31',
                            '2016-04-30', '2016-05-31', '2016-06-31'],
                   'val': [10, 0, 5, 10, 0, 0],
                   'pct_val': [None, -10, None, None, -10, -10]})
df = df.sort('date')
print df

         date  idx  pct_val  val
3  2016-04-30    2      NaN   10
0  2016-04-30    1      NaN   10
4  2016-05-31    2      -10    0
1  2016-05-31    1      -10    0
5  2016-06-31    2      -10    0
2  2016-06-31    1      NaN    5

And I want to group by idx then apply a cumulative function with some simple logic. If pct_val is null, add val to to running total, otherwise multiply running total by 1 + pct_val/100. 'cumsum' shows the result of df.groupby('idx').val.cumsum() and 'cumulative_func' is the result I want.

Click to copy

         date  idx  pct_val  val  cumsum  cumulative_func
3  2016-04-30    2      NaN   10      10               10
0  2016-04-30    1      NaN   10      10               10
4  2016-05-31    2      -10    0      10                9
1  2016-05-31    1      -10    0      10                9
5  2016-06-31    2      -10    0      10                8
2  2016-06-31    1      NaN    5      15               14

Any idea if there is a way to do apply a custom cumulative function to a dataframe or a better way to achieve this?

667

asked May 17 '16 18:05

user2899059

2 Answers

I don't believe there is an easy way to accomplish your objective using vectorization. I would first try to get something working, and then optimize for speed if required.

Click to copy

def cumulative_func(df):
    results = []
    for group in df.groupby('idx').groups.itervalues():
        total = 0
        result = []
        for p, v in df.ix[group, ['pct_val', 'val']].values:
            if np.isnan(p):
                total += v
            else:
                total *= (1 + .01 * p)
            result.append(total)
        results.append(pd.Series(result, index=group))
    return pd.concat(results).reindex(df.index)

df['cumulative_func'] = cumulative_func(df)

>>> df
         date  idx  pct_val  val  cumulative_func
3  2016-04-30    2      NaN   10             10.0
0  2016-04-30    1      NaN   10             10.0
4  2016-05-31    2      -10    0              9.0
1  2016-05-31    1      -10    0              9.0
5  2016-06-31    2      -10    0              8.1
2  2016-06-31    1      NaN    5             14.0

191

answered Oct 26 '22 19:10

Alexander

First I cleaned up your setup

Setup

Click to copy

df = pd.DataFrame({'idx': [1, 1, 1, 2, 2, 2],
                   'date': ['2016-04-30', '2016-05-31', '2016-06-31',
                            '2016-04-30', '2016-05-31', '2016-06-31'],
                   'val': [10, 0, 5, 10, 0, 0],
                   'pct_val': [None, -10, None, None, -10, -10]})
df = df.sort_values(['date', 'idx'])
print df

Looks like:

Click to copy

         date  idx  pct_val  val
0  2016-04-30    1      NaN   10
3  2016-04-30    2      NaN   10
1  2016-05-31    1    -10.0    0
4  2016-05-31    2    -10.0    0
2  2016-06-31    1      NaN    5
5  2016-06-31    2    -10.0    0

Solution

Click to copy

def cumcustom(df):
    df = df.copy()
    running_total = 0
    for idx, row in df.iterrows():
        if pd.isnull(row.ix['pct_val']):
            running_total += row.ix['val']
        else:
            running_total *= row.ix['pct_val'] / 100. + 1
        df.loc[idx, 'cumcustom'] = running_total
    return df

Then apply

Click to copy

df.groupby('idx').apply(cumcustom).reset_index(drop=True).sort_values(['date', 'idx'])

Looks like:

Click to copy

         date  idx  pct_val  val  cumcustom
0  2016-04-30    1      NaN   10       10.0
3  2016-04-30    2      NaN   10       10.0
1  2016-05-31    1    -10.0    0        9.0
4  2016-05-31    2    -10.0    0        9.0
2  2016-06-31    1      NaN    5       14.0
5  2016-06-31    2    -10.0    0        8.1

answered Oct 26 '22 18:10

piRSquared

Related questions
                            
                                use vpn with python requests
                            
                                User group assignment track in django admin
                            
                                Redis py: when to use connection pool?
                            
                                Using f-score in xgb
                            
                                "canonical" way to use logging for Python asserts
                            
                                Expressing pandas subset using pipe
                            
                                Linear Regression with positive coefficients in Python
                            
                                Theano: Initialisation of device gpu failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY
                            
                                What is the best way to top k pool elements instead of only the max one in Tensorflow?
                            
                                How to preserve Labels when SPSS file (.sav) imported into pandas via rpy?
                            
                                Remove interpolation Time series plot for missing values
                            
                                Executing `from abc import xyz` where does the module `abc` go?
                            
                                Python Pandas: Convert 2,000,000 DataFrame rows to Binary Matrix (pd.get_dummies()) without memory error?
                            
                                How to get the Worksheet ID from a Google Spreadsheet with python?
                            
                                Pandas str.replace of pipe character not working?
                            
                                Getting TF-IDF Scores Of Words Using Gensim
                            
                                Twisted logic error
                            
                                DeprecationWarning in sklearn MiniBatchKMeans
                            
                                Adding information to JWT token body using django rest framework jwt
                            
                                Google App Engine custom 404 page for static files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apply custom cumulative function to pandas dataframe

Tags:

python

pandas

user2899059

People also ask

2 Answers

Alexander

Setup

Solution

piRSquared

Recent Activity

Donate For Us