Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort pandas dataframe on json field

I have a data like this in Pandas dataframe

   id     import_id              investor_id     loan_id      meta
   35736  unremit_loss_100312         Q05         0051765139  {u'total_paid': u'75', u'total_expense': u'75'}
   35737  unremit_loss_100313         Q06         0051765140  {u'total_paid': u'77', u'total_expense': u'78'}
   35739  unremit_loss_100314         Q06         0051765141  {u'total_paid': u'80', u'total_expense': u'65'}

How to sort based on total_expense which is value of json field
ex: total_expense on meta field

Output should be

id     import_id              investor_id     loan_id      meta
35739  unremit_loss_100314         Q06         0051765141  {u'total_paid': u'80', u'total_expense': u'65'}
35736  unremit_loss_100312         Q05         0051765139  {u'total_paid': u'75', u'total_expense': u'75'}
35737  unremit_loss_100313         Q06         0051765140  {u'total_paid': u'77', u'total_expense': u'78'}
like image 511
Jameel Grand Avatar asked Apr 09 '19 07:04

Jameel Grand


2 Answers

Setup and Preprocessing

import ast
import numpy as np

if isinstance(x.at[0, 'meta'], str):
    df['meta'] = df['meta'].map(ast.literal_eval)

str.get with Series.argsort

df.iloc[df['meta'].str.get('total_expense').astype(int).argsort()]

      id            import_id investor_id   loan_id                                         meta
2  35739  unremit_loss_100314         Q06  51765141  {'total_paid': '80', 'total_expense': '65'}
0  35736  unremit_loss_100312         Q05  51765139  {'total_paid': '75', 'total_expense': '75'}
1  35737  unremit_loss_100313         Q06  51765140  {'total_paid': '77', 'total_expense': '78'}

List Comprehension

df.iloc[np.argsort([int(x.get('total_expense', '-1')) for x in df['meta']])]

      id            import_id investor_id   loan_id                                         meta
2  35739  unremit_loss_100314         Q06  51765141  {'total_paid': '80', 'total_expense': '65'}
0  35736  unremit_loss_100312         Q05  51765139  {'total_paid': '75', 'total_expense': '75'}
1  35737  unremit_loss_100313         Q06  51765140  {'total_paid': '77', 'total_expense': '78'}

If you need to handle NaNs/missing data, use

u = [  
  int(x.get('total_expense', '-1')) if isinstance(x, dict) else -1 
  for x in df['meta']
]
df.iloc[np.argsort(u)]

      id            import_id investor_id   loan_id                                         meta
2  35739  unremit_loss_100314         Q06  51765141  {'total_paid': '80', 'total_expense': '65'}
0  35736  unremit_loss_100312         Q05  51765139  {'total_paid': '75', 'total_expense': '75'}
1  35737  unremit_loss_100313         Q06  51765140  {'total_paid': '77', 'total_expense': '78'}
like image 80
cs95 Avatar answered Sep 29 '22 15:09

cs95


Use:

print (df)
      id            import_id investor_id   loan_id  \
0  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                               meta  
0   {u'total_paid': u'75', u'total_expense': u'75'}  
1   {u'total_paid': u'75', u'total_expense': u'20'}  
2  {u'total_paid': u'75', u'total_expense': u'100'}  

import ast

df['meta'] = df['meta'].apply(ast.literal_eval)

df = df.iloc[df['meta'].str['total_expense'].astype(int).argsort()]

print (df)
      id            import_id investor_id   loan_id  \
1  35736  unremit_loss_100312         Q05  51765139   
0  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                           meta  
1   {'total_paid': '75', 'total_expense': '20'}  
0   {'total_paid': '75', 'total_expense': '75'}  
2  {'total_paid': '75', 'total_expense': '100'} 

If possible if missing total_expense key for some row convert missing values to some integer lower like all another values, like -1 for first position of these rows:

print (df)
      id            import_id investor_id   loan_id  \
0  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
2  35736  unremit_loss_100312         Q05  51765139   

                                              meta  
0  {u'total_paid': u'75', u'total_expense': u'75'}  
1  {u'total_paid': u'75', u'total_expense': u'20'}  
2                           {u'total_paid': u'75'} 

df['meta'] = df['meta'].apply(ast.literal_eval)


df = df.iloc[df['meta'].str.get('total_expense').fillna(-1).astype(int).argsort()]
print (df)
      id            import_id investor_id   loan_id  \
2  35736  unremit_loss_100312         Q05  51765139   
1  35736  unremit_loss_100312         Q05  51765139   
0  35736  unremit_loss_100312         Q05  51765139   

                                          meta  
2                         {'total_paid': '75'}  
1  {'total_paid': '75', 'total_expense': '20'}  
0  {'total_paid': '75', 'total_expense': '75'}  

Another solution:

df['new'] = df['meta'].str.get('total_expense').astype(int)
df = df.sort_values('new').drop('new', axis=1)
like image 27
jezrael Avatar answered Sep 29 '22 14:09

jezrael