Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Sorting rows based on a value of a column

I have a dataframe df like this :

ID    NAME    AGE
-----------------
M43   ab      32
M32   df      12
M54   gh      34
M43   ab      98
M43   ab      36
M43   cd      32
M32   cd      39
M43   ab      67

I need to sort the rows based on the ID column.
The output df_grouped should look like :

ID    NAME    AGE
-----------------
M43   ab      32
M43   ab      98
M43   ab      36
M43   cd      32
M43   ab      67
M32   df      12
M32   cd      39
M54   gh      34

I tried something like :

df_grouped = df.group_by(df.ID)

for id in list(df.ID.unique()):
   grouped_df_list.append(df_grouped.get_group(id))

Is there any better way to do this ?

like image 441
deadbug Avatar asked Mar 07 '23 23:03

deadbug


2 Answers

You can sort by multiple columns using pd.DataFrame.sort_values:

df = df.sort_values(['ID', 'NAME'])

By default, the argument ascending is set to True.

like image 91
jpp Avatar answered Mar 17 '23 13:03

jpp


You can use pd.factorize to turn the key into a unique number which represents the order it appeared, then argsort that to get the positions to index into your frame, eg:

Given:

     0   1   2
0  M43  ab  32
1  M32  df  12
2  M54  gh  34
3  M43  ab  98
4  M43  ab  36
5  M43  cd  32
6  M32  cd  39
7  M43  ab  67

Then:

new_df = df.loc[pd.factorize(df[0])[0].argsort()]
# might want to consider df.reindex() instead depending...

You get:

     0   1   2
0  M43  ab  32
3  M43  ab  98
4  M43  ab  36
5  M43  cd  32
7  M43  ab  67
1  M32  df  12
6  M32  cd  39
2  M54  gh  34
like image 25
Jon Clements Avatar answered Mar 17 '23 13:03

Jon Clements