Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic way to calculate streaks in pandas dataframe

Given df

df = pd.DataFrame([[1, 5, 2, 8, 2], [2, 4, 4, 20, 2], [3, 3, 1, 20, 2], [4, 2, 2, 1, 3], [5, 1, 4, -5, -4], [1, 5, 2, 2, -20], 
              [2, 4, 4, 3, -8], [3, 3, 1, -1, -1], [4, 2, 2, 0, 12], [5, 1, 4, 20, -2]],
             columns=['A', 'B', 'C', 'D', 'E'], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Based on this answer, I created a function to calculate streaks (up, down).

def streaks(df, column):
    #Create sign column
    df['sign'] = 0
    df.loc[df[column] > 0, 'sign'] = 1
    df.loc[df[column] < 0, 'sign'] = 0
    # Downstreak
    df['d_streak2'] = (df['sign'] == 0).cumsum()
    df['cumsum'] = np.nan
    df.loc[df['sign'] == 1, 'cumsum'] = df['d_streak2']
    df['cumsum'] = df['cumsum'].fillna(method='ffill')
    df['cumsum'] = df['cumsum'].fillna(0)
    df['d_streak'] = df['d_streak2'] - df['cumsum']
    df.drop(['d_streak2', 'cumsum'], axis=1, inplace=True)
    # Upstreak
    df['u_streak2'] = (df['sign'] == 1).cumsum()
    df['cumsum'] = np.nan
    df.loc[df['sign'] == 0, 'cumsum'] = df['u_streak2']
    df['cumsum'] = df['cumsum'].fillna(method='ffill')
    df['cumsum'] = df['cumsum'].fillna(0)
    df['u_streak'] = df['u_streak2'] - df['cumsum']
    df.drop(['u_streak2', 'cumsum'], axis=1, inplace=True)
    del df['sign']
    return df

The function works well, however is very long. I'm sure there's a much betterway to write this. I tried the other answer in but didn't work well.

This is the desired output

streaks(df, 'E')


    A   B   C    D     E    d_streak    u_streak
1   1   5   2    8     2         0.0    1.0
2   2   4   4   20     2         0.0    2.0
3   3   3   1   20     2         0.0    3.0
4   4   2   2    1     3         0.0    4.0
5   5   1   4   -5    -4         1.0    0.0
6   1   5   2    2   -20         2.0    0.0
7   2   4   4    3    -8         3.0    0.0
8   3   3   1   -1    -1         4.0    0.0
9   4   2   2    0    12         0.0    1.0
10  5   1   4   20    -2         1.0    0.0
like image 586
hernanavella Avatar asked Feb 22 '17 16:02

hernanavella


People also ask

How do you count lines in pandas?

You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.

How does pandas calculate row percentage?

You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.

How does pandas calculate sum of rows?

Pandas DataFrame sum() Method The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

How do you find the mean of multiple rows in pandas?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.


1 Answers

You could simplify the function as shown:

def streaks(df, col):
    sign = np.sign(df[col])
    s = sign.groupby((sign!=sign.shift()).cumsum()).cumsum()
    return df.assign(u_streak=s.where(s>0, 0.0), d_streak=s.where(s<0, 0.0).abs())

Using it:

streaks(df, 'E')

enter image description here


Firstly, compute the sign of each cell present in the column under consideration using np.sign. These assign +1 to positive numbers and -1 to the negative.

Next, identify sets of adjacent values (comparing current cell and it's next) using sign!=sign.shift() and take it's cumulative sum which would serve in the grouping process.

Perform groupby letting these as the key/condition and again take the cumulative sum across the sub-group elements.

Finally, assign the positive computed cumsum values to ustreak and the negative ones (absolute value after taking their modulus) to dstreak.

like image 132
Nickil Maveli Avatar answered Sep 20 '22 02:09

Nickil Maveli