Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting dataframe and creating new columns based on the rank of element

I have the following dataframe:

import pandas as pd
df = pd.DataFrame(
                  {
                   'id': [1, 1, 1, 1, 2, 2,2, 2, 3, 3, 3, 3],
                   'name': ['A', 'B', 'C', 'D','A', 'B','C', 'D', 'A', 'B','C', 'D'], 
                   'Value': [1, 2, 3, 4, 5, 6, 0, 2, 4, 6, 3, 5]
                  },
                  columns=['name','id','Value'])`

I can sort the data using id and value as shown below:

df.sort_values(['id','Value'],ascending = [True,False])  

The table that I print will be appearing as follow:

  name   id   Value
    D      1      4
    C      1      3
    B      1      2
    A      1      1
    B      2      6
    A      2      5
    D      2      2
    C      2      0
    B      3      6
    D      3      5
    A      3      4
    C      3      3

I would like to create 4 new columns (Rank1, Rank2, Rank3, Rank4) if element in the column name is highest value, the column Rank1 will be assign as 1 else 0. if element in the column name is second highest value, he column Rank2 will be assign as 1 else 0. Same for Rank3 and Rank4.

How could I do that?

Thanks.

Zep

like image 728
Zephyr Avatar asked Jan 28 '23 23:01

Zephyr


1 Answers

Use:

df = df.join(pd.get_dummies(df.groupby('id').cumcount().add(1)).add_prefix('Rank'))
print (df)
   name  id  Value  Rank1  Rank2  Rank3  Rank4
3     D   1      4      1      0      0      0
2     C   1      3      0      1      0      0
1     B   1      2      0      0      1      0
0     A   1      1      0      0      0      1
5     B   2      6      1      0      0      0
4     A   2      5      0      1      0      0
7     D   2      2      0      0      1      0
6     C   2      0      0      0      0      1
9     B   3      6      1      0      0      0
11    D   3      5      0      1      0      0
8     A   3      4      0      0      1      0
10    C   3      3      0      0      0      1

Details:

For count per groups use GroupBy.cumcount, then add 1:

print (df.groupby('id').cumcount().add(1))
3     1
2     2
1     3
0     4
5     1
4     2
7     3
6     4
9     1
11    2
8     3
10    4
dtype: int64

For indicator columns use get_dumes with add_prefix:

print (pd.get_dummies(df.groupby('id').cumcount().add(1)).add_prefix('Rank'))
    Rank1  Rank2  Rank3  Rank4
3       1      0      0      0
2       0      1      0      0
1       0      0      1      0
0       0      0      0      1
5       1      0      0      0
4       0      1      0      0
7       0      0      1      0
6       0      0      0      1
9       1      0      0      0
11      0      1      0      0
8       0      0      1      0
10      0      0      0      1
like image 57
jezrael Avatar answered May 20 '23 04:05

jezrael