I have dataframe with fields last_payout
and amount
. I need to sum all amount
for each month and plot the output.
df[['last_payout','amount']].dtypes
last_payout datetime64[ns]
amount float64
dtype: object
-
df[['last_payout','amount']].head
<bound method NDFrame.head of last_payout amount
0 2017-02-14 11:00:06 23401.0
1 2017-02-14 11:00:06 1444.0
2 2017-02-14 11:00:06 0.0
3 2017-02-14 11:00:06 0.0
4 2017-02-14 11:00:06 290083.0
I used the code from jezrael's answer to plot the number of transactions per month.
(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
.dt.to_period('M')
.value_counts()
.sort_index()
.plot(kind="bar")
)
Number of transactions per month:
How do I sum all amount
for each month and plot the output? How should I extend the code above for doing this?
I tried to implement .sum
but didn't succeed.
Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.
Pandas Series: cumsum() functionThe cumsum() function is used to get cumulative sum over a DataFrame or Series axis. Returns a DataFrame or Series of the same size containing the cumulative sum. The index or the name of the axis. 0 is equivalent to None or 'index'.
To sum only specific rows, use the loc() method. Mention the beginning and end row index using the : operator. Using loc(), you can also set the columns to be included. We can display the result in a new column.
sum() function is used to return the sum of the values for the requested axis by the user. If the input value is an index axis, then it will add all the values in a column and works same for all the columns. It returns a series that contains the sum of all the values in each column.
PeriodIndex solution:
groupby
by month
period by to_period
and aggregate sum
:
df['amount'].groupby(df['last_payout'].dt.to_period('M')).sum().plot(kind='bar')
DatetimeIndex solutions:
Use resample
by month
s (M
) or starts of months (MS
) with aggregate sum
:
s = df.resample('M', on='last_payout')['amount'].sum()
#alternative
#s = df.groupby(pd.Grouper(freq='M', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-28 23401.0
2017-03-31 1444.0
2017-04-30 290083.0
Freq: M, Name: amount, dtype: float64
Or:
s = df.resample('MS', on='last_payout')['amount'].sum()
#s = df.groupby(pd.Grouper(freq='MS', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-01 23401.0
2017-03-01 1444.0
2017-04-01 290083.0
Freq: MS, Name: amount, dtype: float64
Then is necessary format x
labels:
ax = s.plot(kind='bar')
ax.set_xticklabels(s.index.strftime('%Y-%m'))
Setup:
import pandas as pd
temp=u"""last_payout,amount
2017-02-14 11:00:06,23401.0
2017-03-14 11:00:06,1444.0
2017-03-14 11:00:06,0.0
2017-04-14 11:00:06,0.0
2017-04-14 11:00:06,290083.0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[0])
print (df)
last_payout amount
0 2017-02-14 11:00:06 23401.0
1 2017-03-14 11:00:06 1444.0
2 2017-03-14 11:00:06 0.0
3 2017-04-14 11:00:06 0.0
4 2017-04-14 11:00:06 290083.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With