Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby dates and years and sum up amounts

Tags:

I have pandas dataframe like this:

d = {'dollar_amount': ['200.25', '350.00', '120.00', '400.50', '1231.25', '700.00', '350.00', '200.25', '2340.00'], 'date': ['22-01-2010','22-01-2010','23-01-2010','15-02-2010','27-02-2010','07-03-2010','14-01-2011','09-10-2011','28-07-2012']}
df = pd.DataFrame(data=d)

df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
pd.options.display.float_format = '{:,.4f}'.format
df['dollar_amount'] = df['dollar_amount'].astype(float)
df

    date        dollar_amount
0   22-01-2010  200.25
1   22-01-2010  350.00
2   23-01-2010  120.00
3   15-02-2010  400.50
4   27-02-2010  1231.25
5   07-03-2010  700.00
6   14-01-2011  350.00
7   09-10-2011  200.25
8   11-11-2011  2340.00
9   12-12-2011  144.50
10  12-09-2012  760.00
11  22-10-2012  255.00
12  28-07-2012  650.00

I want to sum amounts for each day in each year. So I am dividing the years like this:

date1 = df[(df['date'] >= '2010-01-01') & (df['date'] < '2011-01-01')]
date2 = df[(df['date'] >= '2011-01-01') & (df['date'] < '2012-01-01')]
date3 = df[(df['date'] >= '2012-01-01') & (df['date'] < '2013-01-01')]

So now I have 3 dataframes with dates from the year 2010 in date1 dataframe, dates from the year 2011 in date2 and dates from 2012 in date3.

Lets look at date1:

print type(date1)
date1

<class 'pandas.core.frame.DataFrame'>

    date        dollar_amount
0   2010-01-22  200.2500
1   2010-01-22  350.0000
2   2010-01-23  120.0000
3   2010-02-15  400.5000
4   2010-02-27  1,231.2500
5   2010-03-07  700.0000

Next I am summing up the amounts date wise, so I am grouping on date using this:

date1 = date1.groupby('date', as_index=False).sum()
date1 = date1[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

date2 = date2.groupby('date', as_index=False).sum()
date2 = date2[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

date3 = date3.groupby('date', as_index=False).sum()
date3 = date3[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

Let's look at the dateframe date1 now:

date1

date        dollar_amount
0   2010-01-22  550.2500
1   2010-01-23  120.0000
2   2010-02-15  400.5000
3   2010-02-27  1,231.2500
4   2010-03-07  700.0000

This is just sorting them in ascending date wise order:

date1 = date1[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

Now I have got the date wise sum of dollarAmounts for each year in different dataframes. Then I am plotting traces for each year. Its working fine and fulfilling the task. But this code is very redundant and I am copying the same code and if I have say data from year 2000 to 2017 then I will have to copy and paste the same piece of code 18 times. I think its not very effective way of doing this.

I am sure there must be a better way of doing this but I cant figure out how. Kindly help me. Thanks.

like image 528
el323 Avatar asked Jan 15 '18 11:01

el323


People also ask

How do you sum with Groupby pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do I create a new column from the output of pandas Groupby () sum ()?

To create a new column for the output of groupby. sum(), we will first apply the groupby. sim() operation and then we will store this result in a new column.

What is possible using Groupby () method of pandas?

groupby() can accept several different arguments: A column or list of columns. A dict or pandas Series. A NumPy array or pandas Index , or an array-like iterable of these.

What is AGG in Groupby?

agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.


1 Answers

I think you can create MultiIndex by years to output:

df1 = df.groupby('date', as_index=False)['dollar_amount'].sum()
df1 = df1.set_index(df['date'].rename('year').dt.year, append=True).swaplevel(0,1)
print (df1)
             date  dollar_amount
year                            
2010 0 2010-01-22       550.2500
     1 2010-01-23       120.0000
     2 2010-02-15       400.5000
     3 2010-02-27     1,231.2500
     4 2010-03-07       700.0000
2011 5 2011-01-14       350.0000
     6 2011-10-09       200.2500
2012 7 2012-07-28     2,340.0000

print (df1.loc[2010])
        date  dollar_amount
0 2010-01-22       550.2500
1 2010-01-23       120.0000
2 2010-02-15       400.5000
3 2010-02-27     1,231.2500
4 2010-03-07       700.0000

print (df1.loc[2011])
        date  dollar_amount
5 2011-01-14       350.0000
6 2011-10-09       200.2500

print (df1.loc[2012])
        date  dollar_amount
7 2012-07-28     2,340.0000

If want create dictionary of DataFrames:

d = dict(tuple(df.groupby(df['date'].dt.year)))
print (d)

print (d[2010])
        date  dollar_amount
0 2010-01-22       550.2500
1 2010-01-23       120.0000
2 2010-02-15       400.5000
3 2010-02-27     1,231.2500
4 2010-03-07       700.0000

print (d[2011])
        date  dollar_amount
5 2011-01-14       350.0000
6 2011-10-09       200.2500

print (d[2012])
        date  dollar_amount
7 2012-07-28     2,340.0000
like image 187
jezrael Avatar answered Oct 12 '22 13:10

jezrael