Given df
df = pd.DataFrame([[1, 5, 2, 8, 2], [2, 4, 4, 20, 2], [3, 3, 1, 20, 2], [4, 2, 2, 1, 3], [5, 1, 4, -5, -4], [1, 5, 2, 2, -20],
[2, 4, 4, 3, -8], [3, 3, 1, -1, -1], [4, 2, 2, 0, 12], [5, 1, 4, 20, -2]],
columns=['A', 'B', 'C', 'D', 'E'], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
Based on this answer, I created a function to calculate streaks (up, down).
def streaks(df, column):
#Create sign column
df['sign'] = 0
df.loc[df[column] > 0, 'sign'] = 1
df.loc[df[column] < 0, 'sign'] = 0
# Downstreak
df['d_streak2'] = (df['sign'] == 0).cumsum()
df['cumsum'] = np.nan
df.loc[df['sign'] == 1, 'cumsum'] = df['d_streak2']
df['cumsum'] = df['cumsum'].fillna(method='ffill')
df['cumsum'] = df['cumsum'].fillna(0)
df['d_streak'] = df['d_streak2'] - df['cumsum']
df.drop(['d_streak2', 'cumsum'], axis=1, inplace=True)
# Upstreak
df['u_streak2'] = (df['sign'] == 1).cumsum()
df['cumsum'] = np.nan
df.loc[df['sign'] == 0, 'cumsum'] = df['u_streak2']
df['cumsum'] = df['cumsum'].fillna(method='ffill')
df['cumsum'] = df['cumsum'].fillna(0)
df['u_streak'] = df['u_streak2'] - df['cumsum']
df.drop(['u_streak2', 'cumsum'], axis=1, inplace=True)
del df['sign']
return df
The function works well, however is very long. I'm sure there's a much betterway to write this. I tried the other answer in but didn't work well.
This is the desired output
streaks(df, 'E')
A B C D E d_streak u_streak
1 1 5 2 8 2 0.0 1.0
2 2 4 4 20 2 0.0 2.0
3 3 3 1 20 2 0.0 3.0
4 4 2 2 1 3 0.0 4.0
5 5 1 4 -5 -4 1.0 0.0
6 1 5 2 2 -20 2.0 0.0
7 2 4 4 3 -8 3.0 0.0
8 3 3 1 -1 -1 4.0 0.0
9 4 2 2 0 12 0.0 1.0
10 5 1 4 20 -2 1.0 0.0
You can use len(df. index) to find the number of rows in pandas DataFrame, df. index returns RangeIndex(start=0, stop=8, step=1) and use it on len() to get the count.
You can caluclate pandas percentage with total by groupby() and DataFrame. transform() method. The transform() method allows you to execute a function for each value of the DataFrame. Here, the percentage directly summarized DataFrame, then the results will be calculated using all the data.
Pandas DataFrame sum() Method The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.
You could simplify the function as shown:
def streaks(df, col):
sign = np.sign(df[col])
s = sign.groupby((sign!=sign.shift()).cumsum()).cumsum()
return df.assign(u_streak=s.where(s>0, 0.0), d_streak=s.where(s<0, 0.0).abs())
Using it:
streaks(df, 'E')
Firstly, compute the sign of each cell present in the column under consideration using np.sign
. These assign +1 to positive numbers and -1 to the negative.
Next, identify sets of adjacent values (comparing current cell and it's next) using sign!=sign.shift()
and take it's cumulative sum which would serve in the grouping process.
Perform groupby
letting these as the key/condition and again take the cumulative sum across the sub-group elements.
Finally, assign the positive computed cumsum
values to ustreak
and the negative ones (absolute value after taking their modulus) to dstreak
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With