I have got dataset looking like that:
time raccoons_bought x y
22443 1984-01-01 00:00:01 1 55.776462 37.593956
2143 1984-01-01 00:00:01 4 55.757121 37.378225
9664 1984-01-01 00:00:33 3 55.773702 37.599220
33092 1984-01-01 00:01:39 3 55.757121 37.378225
16697 1984-01-01 00:02:32 2 55.678549 37.583023
i need to calculate how much raccoons was bought per day so what i do: make time as index
df = df.set_index(['time'])
sort dataset by it
df.groupby(df.index.date).count()
but before i will sort i need to delete x and y column that mean coordinates
if i don't delete it dataset will look like that:
raccoons_bought x y
1984-01-01 5497 5497 5497
1984-01-02 5443 5443 5443
1984-01-03 5488 5488 5488
1984-01-04 5453 5453 5453
1984-01-05 5536 5536 5536
1984-01-06 5634 5634 5634
1984-01-07 5468 5468 5468
if i delete it, dataset will look fine:
raccoons_bought
1984-01-01 5497
1984-01-02 5443
1984-01-03 5488
1984-01-04 5453
1984-01-05 5536
1984-01-06 5634
1984-01-07 5468
so my question is how to calculate raccoons_bought per day and save coordinates untouched because i want plot this coords on map and find who bought that raccoons
You can do something like this:
In [82]: df
Out[82]:
time raccoons_bought x y
22443 1984-01-01 00:00:01 1 55.776462 37.593956
2143 1984-01-01 00:00:01 4 55.757121 37.378225
9664 1984-01-01 00:00:33 3 55.773702 37.599220
33092 1984-01-01 00:01:39 3 55.757121 37.378225
16697 1984-01-01 00:02:32 2 55.678549 37.583023
In [83]: df.groupby(pd.to_datetime(df.time).dt.date).agg(
...: {'raccoons_bought': 'sum', 'x':'first', 'y':'first'}).reset_index()
Out[83]:
time y x raccoons_bought
0 1984-01-01 37.593956 55.776462 13
In [84]:
Notice that I am using sum
as an aggregation function of raccoons_bought
to get the total, if you simply need the occurence change it to count
or size
You can use:
#if necessary convert to datetime
df['time'] = pd.to_datetime(df['time'])
#thank you JoeCondron
# trim the timestamps to get the datetime object, faster
dates = df['time'].dt.floor('D')
#if necessary python date object, slowier
#dates = df['time'].dt.floor('D')
#aggregate size if want count NaNs
#aggregate count if want omit NaNs
df1 = df.groupby(dates).size()
print (df1)
time
1984-01-01 5
dtype: int64
#if need sums
df11 = df.groupby(dates)['raccoons_bought'].sum().reset_index()
print (df11)
time raccoons_bought
0 1984-01-01 13
If dont need change original columns need transform
with sum
(or size
or count
):
a = df.groupby(dates)['raccoons_bought'].transform('sum')
print (a)
22443 13
2143 13
9664 13
33092 13
16697 13
Name: raccoons_bought, dtype: int64
Then filter all rows by condition:
mask = df.groupby(dates)['raccoons_bought'].transform('sum') > 4
df2 = df.loc[mask, 'raccoons_bought']
print (df2)
22443 1
2143 4
9664 3
33092 3
16697 2
Name: raccoons_bought, dtype: int64
If necessary unique values in list:
df2 = df.loc[mask, 'raccoons_bought'].unique().tolist()
print (df2)
[1, 4, 3, 2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With