Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sort Pandas dataframe with NaT values on top

Im trying to sort a pandas dataframe with NaT values on top. I'm using the df.sort_values function:

df=df.sort_values(by='date_of_last_hoorah_given')

Its working fine and I'm getting a sorted dataframe with NaT values in the bottom

    date_of_last_hoorah_given                              email   first_name  \
16 2016-12-19 07:36:08.000000              [email protected]        Mindy   
29 2016-12-19 07:36:08.000000              [email protected]         Judi   
7  2016-12-19 07:36:08.000000                  [email protected]         Chao   
21 2016-12-19 07:36:08.000000              [email protected]         Bala   
12 2016-12-19 07:36:08.000000            [email protected]       Pushpa   
30 2016-12-22 07:36:08.000000       [email protected]      Sparrow   
28 2016-12-22 07:36:08.000000         [email protected]      Sanjeev   
27 2016-12-22 07:36:08.000000     [email protected]  Twinklenose   
25 2016-12-22 07:36:08.000000       [email protected]    Sweetgaze   
23 2016-12-22 07:36:08.000000            [email protected]       Shreya   
19 2016-12-22 07:36:08.000000              [email protected]       Jiahao   
15 2016-12-22 07:36:08.000000            [email protected]       Janine   
14 2016-12-22 07:36:08.000000                [email protected]         Arlo   
0  2016-12-22 07:36:08.000000         [email protected]       Aditya   
11 2016-12-22 07:36:08.000000        [email protected]      Shirley   
2  2016-12-22 07:36:08.000000             [email protected]     Minerva    
3  2016-12-22 07:36:08.000000             [email protected]        Colby   
13 2016-12-22 07:36:08.000000            [email protected]      Beverly   
6  2016-12-22 07:36:08.000000             [email protected]     Guanting   
5  2016-12-22 07:36:08.000000                  [email protected]         Chen   
18 2016-12-22 10:55:03.474683                  [email protected]          Fen   
9  2016-12-23 07:36:08.000000             [email protected]     Kourtney   
10 2016-12-23 14:30:55.206581             [email protected]       Kailee   
4  2016-12-24 07:36:08.000000                [email protected]        Jing    
31 2016-12-24 16:02:28.945809               [email protected]         Rich   
24 2016-12-25 07:36:08.000000           [email protected]       Ganesh   
8  2016-12-26 07:36:08.000000               [email protected]          Xia   
20 2016-12-27 07:36:08.000000              [email protected]       Kinley   
22 2016-12-28 07:36:08.000000   [email protected]   Honeygleam   
26 2016-12-28 15:29:48.629929             [email protected]       Indira   
17 2016-12-29 02:27:11.125078             [email protected]        Ileen   
32 2016-12-29 15:38:02.335296            [email protected]       Ragnar   
1                         NaT  [email protected]  Flitterbeam   

but when I try to get it on top with the following piece of code:

df=df.sort_values(by='date_of_last_hoorah_given',ascending=[1,0])

Im getting a valueError: Length of ascending (2) != length of by (1) Full stack trace given below:

ValueError                                Traceback (most recent call last)
<ipython-input-107-948a8354aeeb> in <module>()
      1 cd = ClientData(1)
----> 2 cd.get_inactive_users()

<ipython-input-106-ed230054ea86> in get_inactive_users(self)
    346             inactive_users_result.append(user_dict)
    347         df=pd.DataFrame(inactive_users_result)
--> 348         df=df.sort_values(by='date_of_last_hoorah_given',ascending=[1,0])
    349         print(df)

C:\Users\aditya\Anaconda3\lib\site-packages\pandas\core\frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position)
   3126         if com.is_sequence(ascending) and len(by) != len(ascending):
   3127             raise ValueError('Length of ascending (%d) != length of by (%d)' %
-> 3128                              (len(ascending), len(by)))
   3129         if len(by) > 1:
   3130             from pandas.core.groupby import _lexsort_indexer

ValueError: Length of ascending (2) != length of by (1)
like image 343
Aditya Hariharan Avatar asked Feb 06 '23 09:02

Aditya Hariharan


1 Answers

The issue is that NaT is maximal when sorting and therefore will always come last. In order to sort by ascending date while putting NaT in front or on top, you need to sort with two conditions.

np.lexsort will sort an array by any number of conditions and return a sorting slice similar to np.argsort

Also note that I will put the notnull condition last in the array of conditions passed to np.lexsort. np.lexsort sorts the last elements first... I don't know why, but that's the way it is.

So we should sort by df.date_of_last_hoorah_given.notnull() first because those that aren't null will have a value of True which is greater that False in a sorting context. Then we can sort by the rest of the dates.

dates = df.date_of_last_hoorah_given
sort_slice = np.lexsort([dates.values, dates.notnull().values])
df.iloc[sort_slice]

OR! as OP said in comments, this gives the same thing and is much more straight forward

df.sort_values('date_of_last_hoorah_given', na_position='first')

     date_of_last_hoorah_given                              email   first_name
1                          NaT  [email protected]  Flitterbeam
16  2016-12-19 07:36:08.000000              [email protected]        Mindy
29  2016-12-19 07:36:08.000000              [email protected]         Judi
7   2016-12-19 07:36:08.000000                  [email protected]         Chao
21  2016-12-19 07:36:08.000000              [email protected]         Bala
12  2016-12-19 07:36:08.000000            [email protected]       Pushpa
30  2016-12-22 07:36:08.000000       [email protected]      Sparrow
28  2016-12-22 07:36:08.000000         [email protected]      Sanjeev
27  2016-12-22 07:36:08.000000     [email protected]  Twinklenose
25  2016-12-22 07:36:08.000000       [email protected]    Sweetgaze
23  2016-12-22 07:36:08.000000            [email protected]       Shreya
19  2016-12-22 07:36:08.000000              [email protected]       Jiahao
15  2016-12-22 07:36:08.000000            [email protected]       Janine
14  2016-12-22 07:36:08.000000                [email protected]         Arlo
0   2016-12-22 07:36:08.000000         [email protected]       Aditya
11  2016-12-22 07:36:08.000000        [email protected]      Shirley
2   2016-12-22 07:36:08.000000             [email protected]      Minerva
3   2016-12-22 07:36:08.000000             [email protected]        Colby
13  2016-12-22 07:36:08.000000            [email protected]      Beverly
6   2016-12-22 07:36:08.000000             [email protected]     Guanting
5   2016-12-22 07:36:08.000000                  [email protected]         Chen
18  2016-12-22 10:55:03.474683                  [email protected]          Fen
9   2016-12-23 07:36:08.000000             [email protected]     Kourtney
10  2016-12-23 14:30:55.206581             [email protected]       Kailee
4   2016-12-24 07:36:08.000000                [email protected]         Jing
31  2016-12-24 16:02:28.945809               [email protected]         Rich
24  2016-12-25 07:36:08.000000           [email protected]       Ganesh
8   2016-12-26 07:36:08.000000               [email protected]          Xia
20  2016-12-27 07:36:08.000000              [email protected]       Kinley
22  2016-12-28 07:36:08.000000   [email protected]   Honeygleam
26  2016-12-28 15:29:48.629929             [email protected]       Indira
17  2016-12-29 02:27:11.125078             [email protected]        Ileen
32  2016-12-29 15:38:02.335296            [email protected]       Ragnar
like image 73
piRSquared Avatar answered Feb 08 '23 14:02

piRSquared