Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate sum of column per day in pandas?

Tags:

python

pandas

I have got dataset looking like that:

                 time   raccoons_bought     x   y
22443   1984-01-01 00:00:01     1   55.776462   37.593956
2143    1984-01-01 00:00:01     4   55.757121   37.378225
9664    1984-01-01 00:00:33     3   55.773702   37.599220
33092   1984-01-01 00:01:39     3   55.757121   37.378225
16697   1984-01-01 00:02:32     2   55.678549   37.583023

i need to calculate how much raccoons was bought per day so what i do: make time as index

df = df.set_index(['time'])

sort dataset by it

df.groupby(df.index.date).count()

but before i will sort i need to delete x and y column that mean coordinates

if i don't delete it dataset will look like that:

      raccoons_bought x      y
1984-01-01  5497    5497    5497
1984-01-02  5443    5443    5443
1984-01-03  5488    5488    5488
1984-01-04  5453    5453    5453
1984-01-05  5536    5536    5536
1984-01-06  5634    5634    5634
1984-01-07  5468    5468    5468

if i delete it, dataset will look fine:

     raccoons_bought
1984-01-01  5497
1984-01-02  5443
1984-01-03  5488
1984-01-04  5453
1984-01-05  5536
1984-01-06  5634
1984-01-07  5468

so my question is how to calculate raccoons_bought per day and save coordinates untouched because i want plot this coords on map and find who bought that raccoons

like image 722
Anton Avatar asked Dec 23 '22 13:12

Anton


2 Answers

You can do something like this:

In [82]: df
Out[82]: 
                      time  raccoons_bought          x          y
22443  1984-01-01 00:00:01                1  55.776462  37.593956
2143   1984-01-01 00:00:01                4  55.757121  37.378225
9664   1984-01-01 00:00:33                3  55.773702  37.599220
33092  1984-01-01 00:01:39                3  55.757121  37.378225
16697  1984-01-01 00:02:32                2  55.678549  37.583023

In [83]: df.groupby(pd.to_datetime(df.time).dt.date).agg(
    ...:     {'raccoons_bought': 'sum', 'x':'first', 'y':'first'}).reset_index() 
Out[83]: 
         time          y          x  raccoons_bought
0  1984-01-01  37.593956  55.776462               13

In [84]: 

Notice that I am using sum as an aggregation function of raccoons_bought to get the total, if you simply need the occurence change it to count or size

like image 85
Mohamed Ali JAMAOUI Avatar answered Jan 12 '23 14:01

Mohamed Ali JAMAOUI


You can use:

#if necessary convert to datetime
df['time'] = pd.to_datetime(df['time'])
#thank you JoeCondron
# trim the timestamps to get the datetime object, faster
dates = df['time'].dt.floor('D')
#if necessary python date object, slowier
#dates = df['time'].dt.floor('D')

#aggregate size if want count NaNs
#aggregate count if want omit NaNs
df1 = df.groupby(dates).size()
print (df1)
time
1984-01-01    5
dtype: int64

#if need sums
df11 = df.groupby(dates)['raccoons_bought'].sum().reset_index()
print (df11)
         time  raccoons_bought
0  1984-01-01               13

If dont need change original columns need transform with sum (or size or count):

a = df.groupby(dates)['raccoons_bought'].transform('sum')
print (a)
22443    13
2143     13
9664     13
33092    13
16697    13
Name: raccoons_bought, dtype: int64

Then filter all rows by condition:

mask = df.groupby(dates)['raccoons_bought'].transform('sum') > 4
df2 = df.loc[mask, 'raccoons_bought']
print (df2)
22443    1
2143     4
9664     3
33092    3
16697    2
Name: raccoons_bought, dtype: int64

If necessary unique values in list:

df2 = df.loc[mask, 'raccoons_bought'].unique().tolist()
print (df2)
[1, 4, 3, 2]
like image 41
jezrael Avatar answered Jan 12 '23 13:01

jezrael