Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort by certain order (Situation: pandas DataFrame Groupby)

I want to change the day of order presented by below code.
What I want is a result with the order (Mon, Tue, Wed, Thu, Fri, Sat, Sun)
- should I say, sort by key in certain predefined order?


Here is my code which needs some tweak:

f8 = df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'].sum()

print(f8)

Current result:

device_id                         day
device_112                        Thu     436518
                                  Wed     636451
                                  Fri     770307
                                  Tue     792066
                                  Mon     826862
                                  Sat     953503
                                  Sun    1019298
device_223                        Mon    2534895
                                  Thu    2857429
                                  Tue    3303173
                                  Fri    3548178
                                  Wed    3822616
                                  Sun    4213633
                                  Sat    4475221

Desired result:

device_id                         day
device_112                        Mon     826862  
                                  Tue     792066
                                  Wed     636451 
                                  Thu     436518
                                  Fri     770307
                                  Sat     953503
                                  Sun    1019298
device_223                        Mon    2534895
                                  Tue    3303173
                                  Wed    3822616
                                  Thu    2857429
                                  Fri    3548178
                                  Sat    4475221
                                  Sun    4213633

Here, type(df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time']) is a class 'pandas.core.groupby.SeriesGroupBy'.

I have found .sort_values() , but it is a built-in sort function by values.
I want to get some pointers to set some order to use it further data manipulation.
Thanks in advance.

like image 814
SUNDONG Avatar asked Sep 01 '16 15:09

SUNDONG


People also ask

How do you sort a DataFrame in Groupby object?

Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.

Does Pandas Groupby maintain order?

Groupby preserves the order of rows within each group.

Does Groupby sort data Pandas?

Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like – Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups.


2 Answers

Took me some time, but I found the solution. reindex does what you want. See my code example:

a = [1, 2] * 2 + [2, 1] * 3 + [1, 2]
b = ['Mon', 'Wed', 'Thu', 'Fri'] * 3
c = list(range(12))
df = pd.DataFrame(data=[a,b,c]).T
df.columns = ['device', 'day', 'value']
df = df.groupby(['device', 'day']).sum()

gives:

            value
device day       
1      Fri      7
       Mon      0
       Thu     12
       Wed     14
2      Fri     14
       Mon     12
       Thu      6
       Wed      1

Then doing reindex:

df.reindex(['Mon', 'Wed', 'Thu', 'Fri'], level='day')

or more conveniently (credits to burhan)

df.reindex(list(calendar.day_abbr), level='day')

gives:

            value
device day       
1      Mon      0
       Wed     14
       Thu     12
       Fri      7
2      Mon     12
       Wed      1
       Thu      6
       Fri     14
like image 59
PdevG Avatar answered Sep 20 '22 15:09

PdevG


Set the 'day' column as categorical dtype, just make sure when you set the category your list of days is sorted as you'd like it to be. Performing the groupby will then automatically sort it for you, but if you otherwise tried to sort the column it will sort in the correct order that you specify.

# Initial setup.
np.random.seed([3,1415])
n = 100
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
df = pd.DataFrame({
    'device_id': np.random.randint(1,3,n),
    'day': np.random.choice(days, n),
    'dwell_time':np.random.random(n)
    })


# Set as category, groupby, and sort.
df['day'] = df['day'].astype("category", categories=days, ordered=True)
df = df.groupby(['device_id', 'day']).sum()

Update: astype no longer accepts categories, use:

category_day = pd.api.types.CategoricalDtype(categories=days, ordered=True)
df['day'] = df['day'].astype(category_day)

The resulting output:

               dwell_time
device_id day            
1         Mon    4.428626
          Tue    3.259319
          Wed    2.436024
          Thu    0.909724
          Fri    4.974137
          Sat    5.583778
          Sun    2.687258
2         Mon    3.117923
          Tue    2.427154
          Wed    1.943927
          Thu    4.599547
          Fri    2.628887
          Sat    6.247520
          Sun    2.716886

Note that this method works for any type of customized sorting. For example, if you had a column with entries 'a', 'b', 'c', and wanted it to be sorted in a non-standard order, e.g. 'c', 'a', 'b', you'd just do the same type of procedure: specify the column as categorical with your categories being in the non-standard order you want.

like image 30
root Avatar answered Sep 18 '22 15:09

root