I have following dataframe:
some_id
2016-12-26 11:03:10 001
2016-12-26 11:03:13 001
2016-12-26 12:03:13 001
2016-12-26 12:03:13 008
2016-12-27 11:03:10 009
2016-12-27 11:03:13 009
2016-12-27 12:03:13 003
2016-12-27 12:03:13 011
And i need to do something like transform('size') with following sort and get N max values. To get something like this (N=2):
some_id size
2016-12-26 001 3
008 1
2016-12-27 009 2
003 1
Is there elegant way to do that in pandas 0.19.x?
Use value_counts
to compute distinct counts after grouping on the date
part of your DateTimeIndex
. This sorts them in descending order by default.
You only need to take the topmost 2 rows of this result to get the largest (top-2) part.
fnc = lambda x: x.value_counts().head(2)
grp = df.groupby(df.index.date)['some_id'].apply(fnc).reset_index(1, name='size')
grp.rename(columns={'level_1':'some_id'})
setup
from io import StringIO
import pandas as pd
txt = """ some_id
2016-12-26 11:03:10 001
2016-12-26 11:03:13 001
2016-12-26 12:03:13 001
2016-12-26 12:03:13 008
2016-12-27 11:03:10 009
2016-12-27 11:03:13 009
2016-12-27 12:03:13 003
2016-12-27 12:03:13 011"""
df = pd.read_csv(StringIO(txt), sep='\s{2,}', engine='python')
df.index = pd.to_datetime(df.index)
df.some_id = df.some_id.astype(str).str.zfill(3)
df
some_id
2016-12-26 11:03:10 001
2016-12-26 11:03:13 001
2016-12-26 12:03:13 001
2016-12-26 12:03:13 008
2016-12-27 11:03:10 009
2016-12-27 11:03:13 009
2016-12-27 12:03:13 003
2016-12-27 12:03:13 011
using nlargest
df.groupby(pd.TimeGrouper('D')).some_id.value_counts() \
.groupby(level=0, group_keys=False).nlargest(2)
some_id
2016-12-26 001 3
008 1
2016-12-27 009 2
003 1
Name: some_id, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With