pandas groupby dates and years and sum up amounts

Tags:

I have pandas dataframe like this:

d = {'dollar_amount': ['200.25', '350.00', '120.00', '400.50', '1231.25', '700.00', '350.00', '200.25', '2340.00'], 'date': ['22-01-2010','22-01-2010','23-01-2010','15-02-2010','27-02-2010','07-03-2010','14-01-2011','09-10-2011','28-07-2012']}
df = pd.DataFrame(data=d)

df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
pd.options.display.float_format = '{:,.4f}'.format
df['dollar_amount'] = df['dollar_amount'].astype(float)
df

    date        dollar_amount
0   22-01-2010  200.25
1   22-01-2010  350.00
2   23-01-2010  120.00
3   15-02-2010  400.50
4   27-02-2010  1231.25
5   07-03-2010  700.00
6   14-01-2011  350.00
7   09-10-2011  200.25
8   11-11-2011  2340.00
9   12-12-2011  144.50
10  12-09-2012  760.00
11  22-10-2012  255.00
12  28-07-2012  650.00

I want to sum amounts for each day in each year. So I am dividing the years like this:

date1 = df[(df['date'] >= '2010-01-01') & (df['date'] < '2011-01-01')]
date2 = df[(df['date'] >= '2011-01-01') & (df['date'] < '2012-01-01')]
date3 = df[(df['date'] >= '2012-01-01') & (df['date'] < '2013-01-01')]

So now I have 3 dataframes with dates from the year 2010 in date1 dataframe, dates from the year 2011 in date2 and dates from 2012 in date3.

Lets look at date1:

print type(date1)
date1

<class 'pandas.core.frame.DataFrame'>

    date        dollar_amount
0   2010-01-22  200.2500
1   2010-01-22  350.0000
2   2010-01-23  120.0000
3   2010-02-15  400.5000
4   2010-02-27  1,231.2500
5   2010-03-07  700.0000

Next I am summing up the amounts date wise, so I am grouping on date using this:

date1 = date1.groupby('date', as_index=False).sum()
date1 = date1[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

date2 = date2.groupby('date', as_index=False).sum()
date2 = date2[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

date3 = date3.groupby('date', as_index=False).sum()
date3 = date3[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

Let's look at the dateframe date1 now:

date1

date        dollar_amount
0   2010-01-22  550.2500
1   2010-01-23  120.0000
2   2010-02-15  400.5000
3   2010-02-27  1,231.2500
4   2010-03-07  700.0000

This is just sorting them in ascending date wise order:

date1 = date1[['date','dollar_amount']].sort_values(by=['date'], 
ascending=True)

Now I have got the date wise sum of dollarAmounts for each year in different dataframes. Then I am plotting traces for each year. Its working fine and fulfilling the task. But this code is very redundant and I am copying the same code and if I have say data from year 2000 to 2017 then I will have to copy and paste the same piece of code 18 times. I think its not very effective way of doing this.

I am sure there must be a better way of doing this but I cant figure out how. Kindly help me. Thanks.

528

asked Jan 15 '18 11:01

el323

1 Answers

I think you can create MultiIndex by years to output:

df1 = df.groupby('date', as_index=False)['dollar_amount'].sum()
df1 = df1.set_index(df['date'].rename('year').dt.year, append=True).swaplevel(0,1)
print (df1)
             date  dollar_amount
year                            
2010 0 2010-01-22       550.2500
     1 2010-01-23       120.0000
     2 2010-02-15       400.5000
     3 2010-02-27     1,231.2500
     4 2010-03-07       700.0000
2011 5 2011-01-14       350.0000
     6 2011-10-09       200.2500
2012 7 2012-07-28     2,340.0000

print (df1.loc[2010])
        date  dollar_amount
0 2010-01-22       550.2500
1 2010-01-23       120.0000
2 2010-02-15       400.5000
3 2010-02-27     1,231.2500
4 2010-03-07       700.0000

print (df1.loc[2011])
        date  dollar_amount
5 2011-01-14       350.0000
6 2011-10-09       200.2500

print (df1.loc[2012])
        date  dollar_amount
7 2012-07-28     2,340.0000

If want create dictionary of DataFrames:

d = dict(tuple(df.groupby(df['date'].dt.year)))
print (d)

print (d[2010])
        date  dollar_amount
0 2010-01-22       550.2500
1 2010-01-23       120.0000
2 2010-02-15       400.5000
3 2010-02-27     1,231.2500
4 2010-03-07       700.0000

print (d[2011])
        date  dollar_amount
5 2011-01-14       350.0000
6 2011-10-09       200.2500

print (d[2012])
        date  dollar_amount
7 2012-07-28     2,340.0000

187

answered Oct 12 '22 13:10

jezrael

Related questions
                            
                                why is a console app affected by having SSMS open
                            
                                Accessing localStorage with Puppeteer and no url
                            
                                Comparing MFCC feature vectors with DTW
                            
                                Minimum common remainder of division
                            
                                Create a hole in react-native webview for touch events
                            
                                sorted() function differing from results obtained via list comprehension
                            
                                react native google maps alternative [closed]
                            
                                Webpack chunking. No content appearing - chunks not loaded
                            
                                Can Large Object Heap Fragmentation cause OutOfMemory in 64bit processes?
                            
                                How would A* search a graph?
                            
                                Hook to function without delegates (Reflection)
                            
                                Removing space around wedge polar plots in Matplotlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With