I want to change the day of order presented by below code.
What I want is a result with the order (Mon, Tue, Wed, Thu, Fri, Sat, Sun)
- should I say, sort by key in certain predefined order?
Here is my code which needs some tweak:
f8 = df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'].sum()
print(f8)
Current result:
device_id day
device_112 Thu 436518
Wed 636451
Fri 770307
Tue 792066
Mon 826862
Sat 953503
Sun 1019298
device_223 Mon 2534895
Thu 2857429
Tue 3303173
Fri 3548178
Wed 3822616
Sun 4213633
Sat 4475221
Desired result:
device_id day
device_112 Mon 826862
Tue 792066
Wed 636451
Thu 436518
Fri 770307
Sat 953503
Sun 1019298
device_223 Mon 2534895
Tue 3303173
Wed 3822616
Thu 2857429
Fri 3548178
Sat 4475221
Sun 4213633
Here, type(df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'])
is a class 'pandas.core.groupby.SeriesGroupBy'.
I have found .sort_values()
, but it is a built-in sort function by values.
I want to get some pointers to set some order to use it further data manipulation.
Thanks in advance.
Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.
Groupby preserves the order of rows within each group.
Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like – Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups.
Took me some time, but I found the solution. reindex does what you want. See my code example:
a = [1, 2] * 2 + [2, 1] * 3 + [1, 2]
b = ['Mon', 'Wed', 'Thu', 'Fri'] * 3
c = list(range(12))
df = pd.DataFrame(data=[a,b,c]).T
df.columns = ['device', 'day', 'value']
df = df.groupby(['device', 'day']).sum()
gives:
value
device day
1 Fri 7
Mon 0
Thu 12
Wed 14
2 Fri 14
Mon 12
Thu 6
Wed 1
Then doing reindex:
df.reindex(['Mon', 'Wed', 'Thu', 'Fri'], level='day')
or more conveniently (credits to burhan)
df.reindex(list(calendar.day_abbr), level='day')
gives:
value
device day
1 Mon 0
Wed 14
Thu 12
Fri 7
2 Mon 12
Wed 1
Thu 6
Fri 14
Set the 'day'
column as categorical dtype, just make sure when you set the category your list of days is sorted as you'd like it to be. Performing the groupby
will then automatically sort it for you, but if you otherwise tried to sort the column it will sort in the correct order that you specify.
# Initial setup.
np.random.seed([3,1415])
n = 100
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
df = pd.DataFrame({
'device_id': np.random.randint(1,3,n),
'day': np.random.choice(days, n),
'dwell_time':np.random.random(n)
})
# Set as category, groupby, and sort.
df['day'] = df['day'].astype("category", categories=days, ordered=True)
df = df.groupby(['device_id', 'day']).sum()
Update: astype no longer accepts categories, use:
category_day = pd.api.types.CategoricalDtype(categories=days, ordered=True)
df['day'] = df['day'].astype(category_day)
The resulting output:
dwell_time
device_id day
1 Mon 4.428626
Tue 3.259319
Wed 2.436024
Thu 0.909724
Fri 4.974137
Sat 5.583778
Sun 2.687258
2 Mon 3.117923
Tue 2.427154
Wed 1.943927
Thu 4.599547
Fri 2.628887
Sat 6.247520
Sun 2.716886
Note that this method works for any type of customized sorting. For example, if you had a column with entries 'a', 'b', 'c'
, and wanted it to be sorted in a non-standard order, e.g. 'c', 'a', 'b'
, you'd just do the same type of procedure: specify the column as categorical with your categories being in the non-standard order you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With