I want to change the day of order presented by below code. What I want is a result with the order (Mon, Tue, Wed, Thu, Fri, Sat, Sun) - should I say, sort by key in certain predefined order? <hr> Here is my code which needs some tweak: <pre class="prettyprint"><code>f8 = df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'].sum() print(f8) </code></pre> Current result: <pre class="prettyprint"><code>device_id day device_112 Thu 436518 Wed 636451 Fri 770307 Tue 792066 Mon 826862 Sat 953503 Sun 1019298 device_223 Mon 2534895 Thu 2857429 Tue 3303173 Fri 3548178 Wed 3822616 Sun 4213633 Sat 4475221 </code></pre> Desired result: <pre class="prettyprint"><code>device_id day device_112 Mon 826862 Tue 792066 Wed 636451 Thu 436518 Fri 770307 Sat 953503 Sun 1019298 device_223 Mon 2534895 Tue 3303173 Wed 3822616 Thu 2857429 Fri 3548178 Sat 4475221 Sun 4213633 </code></pre> <hr> Here, <code>type(df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'])</code> is a class 'pandas.core.groupby.SeriesGroupBy'. I have found <code>.sort_values()</code> , but it is a built-in sort function by values. I want to get some pointers to set some order to use it further data manipulation. Thanks in advance.

Took me some time, but I found the solution. reindex does what you want. See my code example: <pre class="prettyprint"><code>a = [1, 2] * 2 + [2, 1] * 3 + [1, 2] b = ['Mon', 'Wed', 'Thu', 'Fri'] * 3 c = list(range(12)) df = pd.DataFrame(data=[a,b,c]).T df.columns = ['device', 'day', 'value'] df = df.groupby(['device', 'day']).sum() </code></pre> gives: <pre class="prettyprint"><code> value device day 1 Fri 7 Mon 0 Thu 12 Wed 14 2 Fri 14 Mon 12 Thu 6 Wed 1 </code></pre> Then doing reindex: <pre class="prettyprint"><code>df.reindex(['Mon', 'Wed', 'Thu', 'Fri'], level='day') </code></pre> or more conveniently (credits to burhan) <pre class="prettyprint"><code>df.reindex(list(calendar.day_abbr), level='day') </code></pre> gives: <pre class="prettyprint"><code> value device day 1 Mon 0 Wed 14 Thu 12 Fri 7 2 Mon 12 Wed 1 Thu 6 Fri 14 </code></pre>

Set the <code>'day'</code> column as categorical dtype, just make sure when you set the category your list of days is sorted as you'd like it to be. Performing the <code>groupby</code> will then automatically sort it for you, but if you otherwise tried to sort the column it will sort in the correct order that you specify. <pre class="prettyprint"><code># Initial setup. np.random.seed([3,1415]) n = 100 days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'] df = pd.DataFrame({ 'device_id': np.random.randint(1,3,n), 'day': np.random.choice(days, n), 'dwell_time':np.random.random(n) }) # Set as category, groupby, and sort. df['day'] = df['day'].astype("category", categories=days, ordered=True) df = df.groupby(['device_id', 'day']).sum() </code></pre> Update: astype no longer accepts categories, use: <pre class="prettyprint"><code>category_day = pd.api.types.CategoricalDtype(categories=days, ordered=True) df['day'] = df['day'].astype(category_day) </code></pre> The resulting output: <pre class="prettyprint"><code> dwell_time device_id day 1 Mon 4.428626 Tue 3.259319 Wed 2.436024 Thu 0.909724 Fri 4.974137 Sat 5.583778 Sun 2.687258 2 Mon 3.117923 Tue 2.427154 Wed 1.943927 Thu 4.599547 Fri 2.628887 Sat 6.247520 Sun 2.716886 </code></pre> Note that this method works for any type of customized sorting. For example, if you had a column with entries <code>'a', 'b', 'c'</code>, and wanted it to be sorted in a non-standard order, e.g. <code>'c', 'a', 'b'</code>, you'd just do the same type of procedure: specify the column as categorical with your categories being in the non-standard order you want.

Sort by certain order (Situation: pandas DataFrame Groupby)

Tags:

python

sorting

pandas

I want to change the day of order presented by below code.
What I want is a result with the order (Mon, Tue, Wed, Thu, Fri, Sat, Sun)
- should I say, sort by key in certain predefined order?

Here is my code which needs some tweak:

f8 = df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'].sum()

print(f8)

Current result:

device_id                         day
device_112                        Thu     436518
                                  Wed     636451
                                  Fri     770307
                                  Tue     792066
                                  Mon     826862
                                  Sat     953503
                                  Sun    1019298
device_223                        Mon    2534895
                                  Thu    2857429
                                  Tue    3303173
                                  Fri    3548178
                                  Wed    3822616
                                  Sun    4213633
                                  Sat    4475221

Desired result:

device_id                         day
device_112                        Mon     826862  
                                  Tue     792066
                                  Wed     636451 
                                  Thu     436518
                                  Fri     770307
                                  Sat     953503
                                  Sun    1019298
device_223                        Mon    2534895
                                  Tue    3303173
                                  Wed    3822616
                                  Thu    2857429
                                  Fri    3548178
                                  Sat    4475221
                                  Sun    4213633

Here, type(df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time']) is a class 'pandas.core.groupby.SeriesGroupBy'.

I have found .sort_values() , but it is a built-in sort function by values.
I want to get some pointers to set some order to use it further data manipulation.
Thanks in advance.

814

asked Sep 01 '16 15:09

SUNDONG

2 Answers

Took me some time, but I found the solution. reindex does what you want. See my code example:

a = [1, 2] * 2 + [2, 1] * 3 + [1, 2]
b = ['Mon', 'Wed', 'Thu', 'Fri'] * 3
c = list(range(12))
df = pd.DataFrame(data=[a,b,c]).T
df.columns = ['device', 'day', 'value']
df = df.groupby(['device', 'day']).sum()

gives:

            value
device day       
1      Fri      7
       Mon      0
       Thu     12
       Wed     14
2      Fri     14
       Mon     12
       Thu      6
       Wed      1

Then doing reindex:

df.reindex(['Mon', 'Wed', 'Thu', 'Fri'], level='day')

or more conveniently (credits to burhan)

df.reindex(list(calendar.day_abbr), level='day')

gives:

            value
device day       
1      Mon      0
       Wed     14
       Thu     12
       Fri      7
2      Mon     12
       Wed      1
       Thu      6
       Fri     14

answered Sep 20 '22 15:09

PdevG

Set the 'day' column as categorical dtype, just make sure when you set the category your list of days is sorted as you'd like it to be. Performing the groupby will then automatically sort it for you, but if you otherwise tried to sort the column it will sort in the correct order that you specify.

# Initial setup.
np.random.seed([3,1415])
n = 100
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
df = pd.DataFrame({
    'device_id': np.random.randint(1,3,n),
    'day': np.random.choice(days, n),
    'dwell_time':np.random.random(n)
    })


# Set as category, groupby, and sort.
df['day'] = df['day'].astype("category", categories=days, ordered=True)
df = df.groupby(['device_id', 'day']).sum()

Update: astype no longer accepts categories, use:

category_day = pd.api.types.CategoricalDtype(categories=days, ordered=True)
df['day'] = df['day'].astype(category_day)

The resulting output:

               dwell_time
device_id day            
1         Mon    4.428626
          Tue    3.259319
          Wed    2.436024
          Thu    0.909724
          Fri    4.974137
          Sat    5.583778
          Sun    2.687258
2         Mon    3.117923
          Tue    2.427154
          Wed    1.943927
          Thu    4.599547
          Fri    2.628887
          Sat    6.247520
          Sun    2.716886

Note that this method works for any type of customized sorting. For example, if you had a column with entries 'a', 'b', 'c', and wanted it to be sorted in a non-standard order, e.g. 'c', 'a', 'b', you'd just do the same type of procedure: specify the column as categorical with your categories being in the non-standard order you want.

answered Sep 18 '22 15:09

root

Related questions
                            
                                Warning (from warnings module): ResourceWarning: unclosed <socket.socket object, fd=404, family=2, type=1, proto=0> using selenium
                            
                                How to redirect to an external domain in Flask?
                            
                                "Got 1 columns instead of ..." error in numpy
                            
                                BS4: Getting text in tag
                            
                                Python OpenCV Ellipse - takes at most 5 arguments (8 given)
                            
                                Installing NumPy via Anaconda in Windows
                            
                                How to add a delay to supervised process in supervisor - linux
                            
                                Python Sniffing from Black Hat Python book
                            
                                pandas series: change order of index
                            
                                multiprocessing.Queue and Queue.Queue are different?
                            
                                Multivariate Normal CDF in Python using scipy
                            
                                Python TypeError: expected string or buffer
                            
                                How to change parent attribute in subclass python
                            
                                Hiding console output produced by os.system
                            
                                Speeding up reading of very large netcdf file in python
                            
                                Matplotlib: user defined plot function print twice
                            
                                python: check if an numpy array contains any element of another array
                            
                                How to make more than 10 subplots in a figure?
                            
                                Get HTML table into pandas Dataframe, not list of dataframe objects
                            
                                tensorflow.train.import_meta_graph does not work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With