Pythonic way to calculate streaks in pandas dataframe

Tags:

Given df

df = pd.DataFrame([[1, 5, 2, 8, 2], [2, 4, 4, 20, 2], [3, 3, 1, 20, 2], [4, 2, 2, 1, 3], [5, 1, 4, -5, -4], [1, 5, 2, 2, -20], 
              [2, 4, 4, 3, -8], [3, 3, 1, -1, -1], [4, 2, 2, 0, 12], [5, 1, 4, 20, -2]],
             columns=['A', 'B', 'C', 'D', 'E'], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Based on this answer, I created a function to calculate streaks (up, down).

def streaks(df, column):
    #Create sign column
    df['sign'] = 0
    df.loc[df[column] > 0, 'sign'] = 1
    df.loc[df[column] < 0, 'sign'] = 0
    # Downstreak
    df['d_streak2'] = (df['sign'] == 0).cumsum()
    df['cumsum'] = np.nan
    df.loc[df['sign'] == 1, 'cumsum'] = df['d_streak2']
    df['cumsum'] = df['cumsum'].fillna(method='ffill')
    df['cumsum'] = df['cumsum'].fillna(0)
    df['d_streak'] = df['d_streak2'] - df['cumsum']
    df.drop(['d_streak2', 'cumsum'], axis=1, inplace=True)
    # Upstreak
    df['u_streak2'] = (df['sign'] == 1).cumsum()
    df['cumsum'] = np.nan
    df.loc[df['sign'] == 0, 'cumsum'] = df['u_streak2']
    df['cumsum'] = df['cumsum'].fillna(method='ffill')
    df['cumsum'] = df['cumsum'].fillna(0)
    df['u_streak'] = df['u_streak2'] - df['cumsum']
    df.drop(['u_streak2', 'cumsum'], axis=1, inplace=True)
    del df['sign']
    return df

The function works well, however is very long. I'm sure there's a much betterway to write this. I tried the other answer in but didn't work well.

This is the desired output

streaks(df, 'E')


    A   B   C    D     E    d_streak    u_streak
1   1   5   2    8     2         0.0    1.0
2   2   4   4   20     2         0.0    2.0
3   3   3   1   20     2         0.0    3.0
4   4   2   2    1     3         0.0    4.0
5   5   1   4   -5    -4         1.0    0.0
6   1   5   2    2   -20         2.0    0.0
7   2   4   4    3    -8         3.0    0.0
8   3   3   1   -1    -1         4.0    0.0
9   4   2   2    0    12         0.0    1.0
10  5   1   4   20    -2         1.0    0.0

586

asked Feb 22 '17 16:02

hernanavella

1 Answers

You could simplify the function as shown:

def streaks(df, col):
    sign = np.sign(df[col])
    s = sign.groupby((sign!=sign.shift()).cumsum()).cumsum()
    return df.assign(u_streak=s.where(s>0, 0.0), d_streak=s.where(s<0, 0.0).abs())

Using it:

streaks(df, 'E')

enter image description here

Firstly, compute the sign of each cell present in the column under consideration using np.sign. These assign +1 to positive numbers and -1 to the negative.

Next, identify sets of adjacent values (comparing current cell and it's next) using sign!=sign.shift() and take it's cumulative sum which would serve in the grouping process.

Perform groupby letting these as the key/condition and again take the cumulative sum across the sub-group elements.

Finally, assign the positive computed cumsum values to ustreak and the negative ones (absolute value after taking their modulus) to dstreak.

132

answered Sep 20 '22 02:09

Nickil Maveli

Related questions
                            
                                How can I configure IPython to issue the same "magic" commands at every startup?
                            
                                Finding minimum value for each level of a multi-index dataframe
                            
                                python logging: sending StreamHandler to file from command line
                            
                                No response from celery worker with TensorFlow
                            
                                use AWS APIs with Python to use Polly Services
                            
                                Correlation between a pandas Series and a whole DataFrame
                            
                                object of type '_csv.reader' has no len(), csv data not recognized
                            
                                is boto3 supported by ansible?
                            
                                ImportError: No module named custom storages - django-storages boto
                            
                                Python's dir(object) and __builtin__ equivalent in Julia
                            
                                Calculate the sum of model properties in Django
                            
                                TensorArray TensorArray_1_0: Could not read from TensorArray index 0 because it has not yet been written to
                            
                                Importing tensorflow when embedding python in c++ returns null
                            
                                Paramiko: nest ssh session to another machine while preserving paramiko functionality (ProxyJump)
                            
                                TensorFlow - How to predict with trained model on a different test dataset?
                            
                                docker stucks when executing time.sleep(1) in a python loop
                            
                                Python Pandas groupby: filter according to condition on values
                            
                                Python - something faster than 2 nested for loops
                            
                                How can I get pandas' groupby command to return a DataFrame instead of a Series?
                            
                                python Modifying slice of list in function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pythonic way to calculate streaks in pandas dataframe

Tags:

python

python-3.x

pandas

dataframe

numpy

hernanavella

People also ask

1 Answers

Nickil Maveli

Recent Activity

Donate For Us