I have a dataframe where the index value is a mixture of string and number separated by an underscore.
sub_int1_ICA_int2 #
I would like to sort the column index using int1 first and after that int2 The expected output would be:
sub_1_ICA_1
sub_1_ICA_2
sub_1_ICA_3
...........
sub_2_ICA_1
sub_2_ICA_2
...........
I tried to use convert_numeric as I saw in many posts, but I get an error
X.convert_objects(convert_numeric=True).sort_values(['id] , ascending=[True], inplace=True)
>>(KeyError: 'id')
Any help would be nice!
Use reindex
by sorted list
by custom function with dictionary
of tuples:
print (df)
a
sub_1_ICA_0 4
sub_1_ICA_1 8
sub_1_ICA_10 7
sub_1_ICA_11 3
sub_1_ICA_12 2
sub_1_ICA_2 6
sub_1_ICA_3 6
sub_2_ICA_1 1
sub_2_ICA_2 3
a = df.index.tolist()
b = {}
for x in a:
i = x.split('_')
b[x] = ((int(i[1]), int(i[-1])))
print (b)
{'sub_1_ICA_10': (1, 10), 'sub_1_ICA_11': (1, 11),
'sub_1_ICA_1': (1, 1), 'sub_2_ICA_2': (2, 2),
'sub_1_ICA_0': (1, 0), 'sub_1_ICA_12': (1, 12),
'sub_1_ICA_3': (1, 3), 'sub_1_ICA_2': (1, 2),
'sub_2_ICA_1': (2, 1)}
c = sorted(a, key=lambda x: b[x])
print (c)
['sub_1_ICA_0', 'sub_1_ICA_1', 'sub_1_ICA_2', 'sub_1_ICA_3',
'sub_1_ICA_10', 'sub_1_ICA_11', 'sub_1_ICA_12', 'sub_2_ICA_1', 'sub_2_ICA_2']
df = df.reindex(c)
print (df)
a
sub_1_ICA_0 4
sub_1_ICA_1 8
sub_1_ICA_2 6
sub_1_ICA_3 6
sub_1_ICA_10 7
sub_1_ICA_11 3
sub_1_ICA_12 2
sub_2_ICA_1 1
sub_2_ICA_2 3
Another pure pandas solution:
#create MultiIndex by split index, convert to DataFrame
df1 = df.index.str.split('_', expand=True).to_frame()
#set columns and index to original df
df1.columns = list('abcd')
df1.index = df.index
#convert columns to int and sort
df1[['b','d']] = df1[['b','d']].astype(int)
df1 = df1.sort_values(['b','d'])
print (df1)
a b c d
sub_1_ICA_0 sub 1 ICA 0
sub_1_ICA_1 sub 1 ICA 1
sub_1_ICA_2 sub 1 ICA 2
sub_1_ICA_3 sub 1 ICA 3
sub_1_ICA_10 sub 1 ICA 10
sub_1_ICA_11 sub 1 ICA 11
sub_1_ICA_12 sub 1 ICA 12
sub_2_ICA_1 sub 2 ICA 1
sub_2_ICA_2 sub 2 ICA 2
df = df.reindex(df1.index)
print (df)
a
sub_1_ICA_0 4
sub_1_ICA_1 8
sub_1_ICA_2 6
sub_1_ICA_3 6
sub_1_ICA_10 7
sub_1_ICA_11 3
sub_1_ICA_12 2
sub_2_ICA_1 1
sub_2_ICA_2 3
And last version with natsort
:
from natsort import natsorted
df = df.reindex(natsorted(df.index))
print (df)
a
sub_1_ICA_0 4
sub_1_ICA_1 8
sub_1_ICA_2 6
sub_1_ICA_3 6
sub_1_ICA_10 7
sub_1_ICA_11 3
sub_1_ICA_12 2
sub_2_ICA_1 1
sub_2_ICA_2 3
EDIT:
If duplicates values then create new columns by split, convert to int, sort and get back:
print (df)
a
sub_1_ICA_0 4
sub_1_ICA_0 4
sub_1_ICA_1 8
sub_1_ICA_10 7
sub_1_ICA_11 3
sub_1_ICA_12 2
sub_1_ICA_2 6
sub_1_ICA_3 6
sub_2_ICA_1 1
sub_2_ICA_2 3
df.index = df.index.str.split('_', expand=True)
df = df.reset_index()
df[['level_1','level_3']] = df[['level_1','level_3']].astype(int)
df = df.sort_values(['level_1','level_3']).astype(str)
df = df.set_index(['level_0','level_1','level_2','level_3'])
df.index = df.index.map('_'.join)
print (df)
a
sub_1_ICA_0 4
sub_1_ICA_0 4
sub_1_ICA_1 8
sub_1_ICA_2 6
sub_1_ICA_3 6
sub_1_ICA_10 7
sub_1_ICA_11 3
sub_1_ICA_12 2
sub_2_ICA_1 1
sub_2_ICA_2 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With