I have a pandas dataframe like this:
Balance Jan Feb Mar Apr
0 9.724135 0.389376 0.464451 0.229964 0.691504
1 1.114782 0.838406 0.679096 0.185135 0.143883
2 7.613946 0.960876 0.220274 0.788265 0.606402
3 0.144517 0.800086 0.287874 0.223539 0.206002
4 1.332838 0.430812 0.939402 0.045262 0.388466
I would like to group the rows by figuring out if the values from Jan through to Apr are monotonically decreasing (as in rows indexed 1 and 3) or not, and then add up the balances for each group, i.e. in the end I would like to end up with two numbers (1.259299 for the decreasing time series, and 18.670919 for the others).
I think if I could add a column "is decreasing" containg booleans I could do the sums using pandas' groupby, but how would I create this column?
Thanks, Anne
Pandas 0.19 added a Series.is_monotonic
attribute (as mentioned, the algos
module was removed from Pandas public API).
As @Liam notes in his answer, is_monotonic
is in fact an alias for is_monotonic_increasing
, so for clarity I'd recommended directly using either is_monotonic_increasing
or is_monotonic_decreasing
.
Anyway, both are non-strict (i.e. is_monotonic_decreasing
returns True
when the sequence is either decreasing or equal), but you can combine them with is_unqiue
if you need strictness.
my_df = pd.DataFrame({'A':[1,2,3],'B':[1,1,1],'C':[3,2,1]})
my_df
Out[32]:
A B C
0 1 1 3
1 2 1 2
2 3 1 1
my_df.apply(lambda x: x.is_monotonic_increasing) # row-wise monotonic (default axis is 0)
Out[33]:
A True
B True
C False
dtype: bool
my_df.apply(lambda x: x.is_monotonic_increasing, axis=1) # column-wise monotonic
Out[34]:
0 True
1 False
2 False
dtype: bool
You could use one of the is_monotonic
functions from algos:
In [10]: months = ['Jan', 'Feb', 'Mar', 'Apr']
In [11]: df.loc[:, months].apply(lambda x: pd.algos.is_monotonic_float64(-x)[0],
axis=1)
Out[11]:
0 False
1 True
2 False
3 True
4 False
dtype: bool
The is_monotonic
checks whether an array it's decreasing hence the -x.values
.
(This seems substantially faster than Tom's solution, even using the small DataFrame provided.)
months = ['Jan', 'Feb', 'Mar', 'Apr']
Transpose so that we can use the diff
method (which doesn't take an axis argument).
We fill in the first row (January) with 0. Otherwise it's NaN
.
In [77]: df[months].T.diff().fillna(0) <= 0
Out[77]:
0 1 2 3 4
Jan True True True True True
Feb False True True True False
Mar True True False True True
Apr False True True True False
To check if it's monotonically decreasing, use the .all() method. By default this goes over axis 0, the rows (months).
In [78]: is_decreasing = (df[months].T.diff().fillna(0) <= 0).all()
In [79]: is_decreasing
Out[79]:
0 False
1 True
2 False
3 True
4 False
dtype: bool
In [80]: df['is_decreasing'] = is_decreasing
In [81]: df
Out[81]:
Balance Jan Feb Mar Apr is_decreasing
0 9.724135 0.389376 0.464451 0.229964 0.691504 False
1 1.114782 0.838406 0.679096 0.185135 0.143883 True
2 7.613946 0.960876 0.220274 0.788265 0.606402 False
3 0.144517 0.800086 0.287874 0.223539 0.206002 True
4 1.332838 0.430812 0.939402 0.045262 0.388466 False
And like you suggested, we can groupby is_decreasing
and sum:
In [83]: df.groupby('is_decreasing')['Balance'].sum()
Out[83]:
is_decreasing
False 18.670919
True 1.259299
Name: Balance, dtype: float64
It's times like these when I love pandas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With