How to sum all amounts by date in pandas dataframe?

Tags:

python

pandas

I have dataframe with fields last_payout and amount. I need to sum all amount for each month and plot the output.

df[['last_payout','amount']].dtypes

last_payout    datetime64[ns]
amount           float64
dtype: object

df[['last_payout','amount']].head

<bound method NDFrame.head of                last_payout  amount
0      2017-02-14 11:00:06          23401.0
1      2017-02-14 11:00:06          1444.0
2      2017-02-14 11:00:06          0.0
3      2017-02-14 11:00:06          0.0
4      2017-02-14 11:00:06          290083.0

I used the code from jezrael's answer to plot the number of transactions per month.

(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
         .dt.to_period('M')
         .value_counts()
         .sort_index()
         .plot(kind="bar")
)

Number of transactions per month:

Number of transactions per month

How do I sum all amount for each month and plot the output? How should I extend the code above for doing this?

I tried to implement .sum but didn't succeed.

622

asked Apr 07 '18 21:04

user40

1 Answers

PeriodIndex solution:

groupby by month period by to_period and aggregate sum:

df['amount'].groupby(df['last_payout'].dt.to_period('M')).sum().plot(kind='bar')

DatetimeIndex solutions:

Use resample by months (M) or starts of months (MS) with aggregate sum:

s = df.resample('M', on='last_payout')['amount'].sum()
#alternative
#s = df.groupby(pd.Grouper(freq='M', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-28     23401.0
2017-03-31      1444.0
2017-04-30    290083.0
Freq: M, Name: amount, dtype: float64

Or:

s = df.resample('MS', on='last_payout')['amount'].sum()
#s = df.groupby(pd.Grouper(freq='MS', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-01     23401.0
2017-03-01      1444.0
2017-04-01    290083.0
Freq: MS, Name: amount, dtype: float64

Then is necessary format x labels:

ax = s.plot(kind='bar')
ax.set_xticklabels(s.index.strftime('%Y-%m'))

graph

Setup:

import pandas as pd

temp=u"""last_payout,amount
2017-02-14 11:00:06,23401.0
2017-03-14 11:00:06,1444.0
2017-03-14 11:00:06,0.0
2017-04-14 11:00:06,0.0
2017-04-14 11:00:06,290083.0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[0])
print (df)
          last_payout    amount
0 2017-02-14 11:00:06   23401.0
1 2017-03-14 11:00:06    1444.0
2 2017-03-14 11:00:06       0.0
3 2017-04-14 11:00:06       0.0
4 2017-04-14 11:00:06  290083.0

answered Nov 01 '22 15:11

jezrael

Related questions
                            
                                Tinting an image in Pygame
                            
                                How to append list of numerous types to single string (python)
                            
                                Is pandas showing the wrong percentile?
                            
                                What does RepeatedKFold actually mean?
                            
                                Regular expression must contain and may only contain
                            
                                Pythonic way to apply format to all strings in dictionary without f-strings
                            
                                How to support %x formatting on a class that emulates int
                            
                                Creating a column based on multiple conditions
                            
                                Why is it scipy.stats.gaussian_kde() slower than seaborn.kde_plot() for the same data?
                            
                                How To Parse Verbs Using Spacy
                            
                                Python and HyperOpt: How to make multi-process grid searching?
                            
                                Filtering signal frequency in Python
                            
                                How Java program can run python program with virtual environment?
                            
                                How do I sync values in setup.py / install_requires with Pipfile / packages
                            
                                Azure storage get_blob_to_stream cant download saved csv file as stream
                            
                                How to implement a log uniform distribution in Scipy?
                            
                                Can't compare input variables to those from a file
                            
                                How to integrate Django with Kafka using Python?
                            
                                Using strptime to get UTC offset with separation between hours and minutes
                            
                                Replace multiple strings at the same time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With