I have a data like this in Pandas dataframe
id import_id investor_id loan_id meta
35736 unremit_loss_100312 Q05 0051765139 {u'total_paid': u'75', u'total_expense': u'75'}
35737 unremit_loss_100313 Q06 0051765140 {u'total_paid': u'77', u'total_expense': u'78'}
35739 unremit_loss_100314 Q06 0051765141 {u'total_paid': u'80', u'total_expense': u'65'}
How to sort based on total_expense which is value of json field
ex: total_expense on meta field
Output should be
id import_id investor_id loan_id meta
35739 unremit_loss_100314 Q06 0051765141 {u'total_paid': u'80', u'total_expense': u'65'}
35736 unremit_loss_100312 Q05 0051765139 {u'total_paid': u'75', u'total_expense': u'75'}
35737 unremit_loss_100313 Q06 0051765140 {u'total_paid': u'77', u'total_expense': u'78'}
Setup and Preprocessing
import ast
import numpy as np
if isinstance(x.at[0, 'meta'], str):
df['meta'] = df['meta'].map(ast.literal_eval)
str.get
with Series.argsort
df.iloc[df['meta'].str.get('total_expense').astype(int).argsort()]
id import_id investor_id loan_id meta
2 35739 unremit_loss_100314 Q06 51765141 {'total_paid': '80', 'total_expense': '65'}
0 35736 unremit_loss_100312 Q05 51765139 {'total_paid': '75', 'total_expense': '75'}
1 35737 unremit_loss_100313 Q06 51765140 {'total_paid': '77', 'total_expense': '78'}
df.iloc[np.argsort([int(x.get('total_expense', '-1')) for x in df['meta']])]
id import_id investor_id loan_id meta
2 35739 unremit_loss_100314 Q06 51765141 {'total_paid': '80', 'total_expense': '65'}
0 35736 unremit_loss_100312 Q05 51765139 {'total_paid': '75', 'total_expense': '75'}
1 35737 unremit_loss_100313 Q06 51765140 {'total_paid': '77', 'total_expense': '78'}
If you need to handle NaNs/missing data, use
u = [
int(x.get('total_expense', '-1')) if isinstance(x, dict) else -1
for x in df['meta']
]
df.iloc[np.argsort(u)]
id import_id investor_id loan_id meta
2 35739 unremit_loss_100314 Q06 51765141 {'total_paid': '80', 'total_expense': '65'}
0 35736 unremit_loss_100312 Q05 51765139 {'total_paid': '75', 'total_expense': '75'}
1 35737 unremit_loss_100313 Q06 51765140 {'total_paid': '77', 'total_expense': '78'}
Use:
print (df)
id import_id investor_id loan_id \
0 35736 unremit_loss_100312 Q05 51765139
1 35736 unremit_loss_100312 Q05 51765139
2 35736 unremit_loss_100312 Q05 51765139
meta
0 {u'total_paid': u'75', u'total_expense': u'75'}
1 {u'total_paid': u'75', u'total_expense': u'20'}
2 {u'total_paid': u'75', u'total_expense': u'100'}
import ast
df['meta'] = df['meta'].apply(ast.literal_eval)
df = df.iloc[df['meta'].str['total_expense'].astype(int).argsort()]
print (df)
id import_id investor_id loan_id \
1 35736 unremit_loss_100312 Q05 51765139
0 35736 unremit_loss_100312 Q05 51765139
2 35736 unremit_loss_100312 Q05 51765139
meta
1 {'total_paid': '75', 'total_expense': '20'}
0 {'total_paid': '75', 'total_expense': '75'}
2 {'total_paid': '75', 'total_expense': '100'}
If possible if missing total_expense
key for some row convert missing values to some integer lower like all another values, like -1
for first position of these rows:
print (df)
id import_id investor_id loan_id \
0 35736 unremit_loss_100312 Q05 51765139
1 35736 unremit_loss_100312 Q05 51765139
2 35736 unremit_loss_100312 Q05 51765139
meta
0 {u'total_paid': u'75', u'total_expense': u'75'}
1 {u'total_paid': u'75', u'total_expense': u'20'}
2 {u'total_paid': u'75'}
df['meta'] = df['meta'].apply(ast.literal_eval)
df = df.iloc[df['meta'].str.get('total_expense').fillna(-1).astype(int).argsort()]
print (df)
id import_id investor_id loan_id \
2 35736 unremit_loss_100312 Q05 51765139
1 35736 unremit_loss_100312 Q05 51765139
0 35736 unremit_loss_100312 Q05 51765139
meta
2 {'total_paid': '75'}
1 {'total_paid': '75', 'total_expense': '20'}
0 {'total_paid': '75', 'total_expense': '75'}
Another solution:
df['new'] = df['meta'].str.get('total_expense').astype(int)
df = df.sort_values('new').drop('new', axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With