Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum all amounts by date in pandas dataframe?

Tags:

python

pandas

I have dataframe with fields last_payout and amount. I need to sum all amount for each month and plot the output.

df[['last_payout','amount']].dtypes

last_payout    datetime64[ns]
amount           float64
dtype: object

-

df[['last_payout','amount']].head

<bound method NDFrame.head of                last_payout  amount
0      2017-02-14 11:00:06          23401.0
1      2017-02-14 11:00:06          1444.0
2      2017-02-14 11:00:06          0.0
3      2017-02-14 11:00:06          0.0
4      2017-02-14 11:00:06          290083.0

I used the code from jezrael's answer to plot the number of transactions per month.

(df.loc[df['last_payout'].dt.year.between(2016, 2017), 'last_payout']
         .dt.to_period('M')
         .value_counts()
         .sort_index()
         .plot(kind="bar")
)

Number of transactions per month:

Number of transactions per month

How do I sum all amount for each month and plot the output? How should I extend the code above for doing this?

I tried to implement .sum but didn't succeed.

like image 622
user40 Avatar asked Apr 07 '18 21:04

user40


People also ask

How do you sum all values in a DataFrame pandas?

Pandas DataFrame sum() MethodThe sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

How do you do the cumulative sum of a panda?

Pandas Series: cumsum() functionThe cumsum() function is used to get cumulative sum over a DataFrame or Series axis. Returns a DataFrame or Series of the same size containing the cumulative sum. The index or the name of the axis. 0 is equivalent to None or 'index'.

How do I sum specific rows in a DataFrame?

To sum only specific rows, use the loc() method. Mention the beginning and end row index using the : operator. Using loc(), you can also set the columns to be included. We can display the result in a new column.

How do you sum DataFrame in Python?

sum() function is used to return the sum of the values for the requested axis by the user. If the input value is an index axis, then it will add all the values in a column and works same for all the columns. It returns a series that contains the sum of all the values in each column.


1 Answers

PeriodIndex solution:

groupby by month period by to_period and aggregate sum:

df['amount'].groupby(df['last_payout'].dt.to_period('M')).sum().plot(kind='bar')

DatetimeIndex solutions:

Use resample by months (M) or starts of months (MS) with aggregate sum:

s = df.resample('M', on='last_payout')['amount'].sum()
#alternative
#s = df.groupby(pd.Grouper(freq='M', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-28     23401.0
2017-03-31      1444.0
2017-04-30    290083.0
Freq: M, Name: amount, dtype: float64

Or:

s = df.resample('MS', on='last_payout')['amount'].sum()
#s = df.groupby(pd.Grouper(freq='MS', key='last_payout'))['amount'].sum()
print (s)
last_payout
2017-02-01     23401.0
2017-03-01      1444.0
2017-04-01    290083.0
Freq: MS, Name: amount, dtype: float64

Then is necessary format x labels:

ax = s.plot(kind='bar')
ax.set_xticklabels(s.index.strftime('%Y-%m'))

graph

Setup:

import pandas as pd

temp=u"""last_payout,amount
2017-02-14 11:00:06,23401.0
2017-03-14 11:00:06,1444.0
2017-03-14 11:00:06,0.0
2017-04-14 11:00:06,0.0
2017-04-14 11:00:06,290083.0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), parse_dates=[0])
print (df)
          last_payout    amount
0 2017-02-14 11:00:06   23401.0
1 2017-03-14 11:00:06    1444.0
2 2017-03-14 11:00:06       0.0
3 2017-04-14 11:00:06       0.0
4 2017-04-14 11:00:06  290083.0
like image 75
jezrael Avatar answered Nov 01 '22 15:11

jezrael