Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort pandas df subset of rows (within a group) by specific column

I have the following dataframe let’s say:

df


A B C D E
z k s 7 d
z k s 6 l
x t r 2 e
x t r 1 x
u c r 8 f
u c r 9 h
y t s 5 l
y t s 2 o

And I would like to sort it based on col D for each sub row (that has for example same cols A,B and C in this case)

The expected output would be:

df


A B C D E
z k s 6 l
z k s 7 d
x t r 1 x
x t r 2 e
u c r 8 f
u c r 9 h
y t s 2 o
y t s 5 l

Any help for this kind of operation?

like image 689
Salvatore Nedia Avatar asked Jun 05 '21 00:06

Salvatore Nedia


People also ask

How do you sort a subset of a DataFrame in Python?

To sort the DataFrame based on the values in a single column, you'll use . sort_values() . By default, this will return a new DataFrame sorted in ascending order.

How do I sort rows in pandas based on column?

You can sort by column values in pandas DataFrame using sort_values() method. To specify the order, you have to use ascending boolean property; False for descending and True for ascending. By default, it is set to True.

How do you sort pandas in Groupby?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.


3 Answers

I think it should be as simple as this:

df = df.sort_values(["A", "B", "C", "D"])
like image 198
saedx1 Avatar answered Sep 27 '22 23:09

saedx1


You can use groupby and sort values (also credit to @Henry Ecker for his comment):

df.groupby(['A','B','C'],group_keys=False,sort=False).apply(pd.DataFrame.sort_values,'D')

output:

    A   B   C   D   E
1   z   k   s   6   l
0   z   k   s   7   d
3   x   t   r   1   x
2   x   t   r   2   e
4   u   c   r   8   f
5   u   c   r   9   h
7   y   t   s   2   o
6   y   t   s   5   l
like image 36
Ehsan Avatar answered Sep 27 '22 23:09

Ehsan


Let us try ngroup create the help col

df['new1'] = df.groupby(['A','B','C'],sort=False).ngroup()
df = df.sort_values(['new1','D']).drop('new1',axis=1)
df
   A  B  C  D  E
1  z  k  s  6  l
0  z  k  s  7  d
3  x  t  r  1  x
2  x  t  r  2  e
4  u  c  r  8  f
5  u  c  r  9  h
7  y  t  s  2  o
6  y  t  s  5  l
like image 41
BENY Avatar answered Sep 27 '22 22:09

BENY