Pandas cumulative sum on column with condition

Tags:

I didn't found answer elsewhere, so I need to ask. Probably because I don't know how to correctly name it. (English is not my origin language)

I have large datetime data frame. Time is important here. One column in df has values [Nan, 1, -1]. I need to perform quick calculation to have cumulative sum reseting when value is changing.

Example.

    Time                 sign    desire_value
2014-01-24 05:00:00      Nan     Nan 
2014-01-24 06:00:00      Nan     Nan
2014-01-24 07:00:00      Nan     Nan 
2014-01-24 08:00:00      1       1
2014-01-24 09:00:00      1       2
2014-01-24 10:00:00      1       3
2014-01-24 11:00:00      -1      1
2014-01-24 12:00:00      -1      2
2014-01-24 13:00:00      -1      3
2014-01-24 14:00:00      -1      4
2014-01-24 15:00:00      -1      5
2014-01-24 16:00:00      1       1
2014-01-24 17:00:00      1       2
2014-01-24 18:00:00      1       3
2014-01-24 19:00:00      -1      1
2014-01-24 20:00:00      -1      2  
2014-01-24 21:00:00      1       1
2014-01-24 22:00:00      1       2

I have working solution using function, but it is not very efficient.

    df['sign_1'] = df['sign'].shift(1)

    for index, row in df.iterrows():
        if row.sign is None:
            df.loc[line, 'desire_value'] = None
        elif row.sign == row.sign_1:
            acc += 1
            df.loc[index, 'desire_value'] = acc
        else:
            acc = 1 
            df.loc[index, 'desire_value'] = acc

I cannot find any array based approach. I found that the best way to iterate efficiently in Python is using Cython, but is there more "Python" way to solve this?

803

asked Apr 02 '15 19:04

rpeczykowski

1 Answers

see the last section here

This is an itertools like groupby

In [86]: v = df['value'].dropna()

The grouper is separated on the group breakpoints; cumsum makes it have separate groups

In [87]: grouper = (v!=v.shift()).cumsum()

In [88]: grouper
Out[88]: 
3     1
4     1
5     1
6     2
7     2
8     2
9     2
10    2
11    3
12    3
13    3
14    4
15    4
16    5
17    5
Name: value, dtype: int64

Then just a simple cumsum

In [89]: df.groupby(grouper)['value'].cumsum()
Out[89]: 
0    NaN
1    NaN
2    NaN
3      1
4      2
5      3
6     -1
7     -2
8     -3
9     -4
10    -5
11     1
12     2
13     3
14    -1
15    -2
16     1
17     2
dtype: float64

You can certainly .abs() the above if you do in fact want the absolute values.

answered Nov 04 '22 19:11

Jeff

Related questions
                            
                                Installing PyQt
                            
                                Slice in python - is it a copy or just a pointer?
                            
                                How to hide/remove some methods in inherited class in Python?
                            
                                Grouping indices of unique elements in numpy
                            
                                Building/including Boost.Python in VS2013
                            
                                How to set an integer as a environment variable?
                            
                                django rest framework serializers and django forms
                            
                                gaussian sum filter for irregular spaced points
                            
                                "pip install line_profiler" fails
                            
                                Is PythonQt deprecated?
                            
                                Paramiko with continuous stdout
                            
                                retrieve misclassified documents using scikitlearn
                            
                                matplotlib plot and then wait for raw input
                            
                                Get all friends of a given user on twitter with tweepy
                            
                                Django success url using kwargs
                            
                                Python - TypeError: object of type '...' has no len()
                            
                                Write pandas dataframe to xlsm file (Excel with Macros enabled)
                            
                                DataFrame of DataFrames with pandas
                            
                                PyQt - forcing one tab to appear first?
                            
                                Networkx Statistical Inference

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas cumulative sum on column with condition

Tags:

python

pandas

dataframe

rpeczykowski

People also ask

1 Answers

Jeff

Recent Activity

Donate For Us