Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Rankings of Column Names in Pandas Dataframe

I have pivoted the Customer ID against their most frequently purchased genres of performances:

Genre            Jazz     Dance     Music  Theatre
Customer                                        
100000000001           0      3         1        2
100000000002           0      1         6        2
100000000003           0      3        13        4
100000000004           0      5         4        1
100000000005           1     10        16       14

My desired result is to append the column names according to the rankings:

Genre            Jazz     Dance     Music  Theatre          Rank1          Rank2          Rank3          Rank4
Customer                                         
100000000001           0      3         1        2          Dance        Theatre          Music           Jazz
100000000002           0      1         6        2          Music        Theatre          Dance           Jazz
100000000003           0      3        13        4          Music        Theatre          Dance           Jazz
100000000004           0      5         4        1          Dance          Music        Theatre           Jazz
100000000005           1     10        16       14          Music        Theatre          Dance           Jazz

I have looked up some threads but the closest thing I can find is idxmax. However that only gives me Rank1.

Could anyone help me to get the result I need?

Thanks a lot!

Dennis

like image 490
dendoniseden Avatar asked Aug 10 '20 15:08

dendoniseden


People also ask

How do I get a list of Pandas column names?

You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.

What does rank () do in Pandas?

Pandas DataFrame: rank() function The rank() function is used to compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values. Index to direct ranking.

How do you rank a column in Pandas?

rank() method returns a rank of every respective index of a series passed. The rank is returned on the basis of position after sorting. Parameters: axis: 0 or 'index' for rows and 1 or 'columns' for Column.


Video Answer


3 Answers

Use:

i = np.argsort(df.to_numpy() * -1, axis=1)
r = pd.DataFrame(df.columns[i], index=df.index, columns=range(1, i.shape[1] + 1)) 
df = df.join(r.add_prefix('Rank'))

Details:

Use np.argsort along axis=1 to get the indices i that would sort the genres in descending order.

print(i)
array([[1, 3, 2, 0],
       [2, 3, 1, 0],
       [2, 3, 1, 0],
       [1, 2, 3, 0],
       [2, 3, 1, 0]])

Create a new dataframe r from the columns of dataframe df taken along the indices i (i.e df.columns[i]), then use DataFrame.join to join the dataframe r with df:

print(df)
              Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer                                                                
100000000001     0      3      1        2  Dance  Theatre    Music  Jazz
100000000002     0      1      6        2  Music  Theatre    Dance  Jazz
100000000003     0      3     13        4  Music  Theatre    Dance  Jazz
100000000004     0      5      4        1  Dance    Music  Theatre  Jazz
100000000005     1     10     16       14  Music  Theatre    Dance  Jazz
like image 198
Shubham Sharma Avatar answered Oct 13 '22 14:10

Shubham Sharma


Try this:

dfp = (df.rank(ascending=False, axis=1).stack()
         .astype(int).rename('rank').reset_index(level=1))
df.assign(**dfp.set_index('rank', append=True)['Genre'].unstack().add_prefix('Rank'))

Output:

Genre         Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer                                                                
100000000001     0      3      1        2  Dance  Theatre    Music  Jazz
100000000002     0      1      6        2  Music  Theatre    Dance  Jazz
100000000003     0      3     13        4  Music  Theatre    Dance  Jazz
100000000004     0      5      4        1  Dance    Music  Theatre  Jazz
100000000005     1     10     16       14  Music  Theatre    Dance  Jazz

Use rank and reshape dataframe, then join back to original dataframe using assign.

like image 42
Scott Boston Avatar answered Oct 13 '22 14:10

Scott Boston


Lets try stack, cumcount and sort_values

s = df.stack().sort_values(ascending=False).groupby(level=0).cumcount() + 1
s1 = (s.reset_index(1)
    .set_index(0, append=True)
    .unstack(1)
    .add_prefix("Rank")
    
    )
s1.columns = s1.columns.get_level_values(1)

then join back on your customer genre index.

df.join(s1)

                 Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer_Genre                                                            
100000000001       0      3      1        2  Dance  Theatre    Music  Jazz
100000000002       0      1      6        2  Music  Theatre    Dance  Jazz
100000000003       0      3     13        4  Music  Theatre    Dance  Jazz
100000000004       0      5      4        1  Dance    Music  Theatre  Jazz
100000000005       1     10     16       14  Music  Theatre    Dance  Jazz
like image 43
Umar.H Avatar answered Oct 13 '22 14:10

Umar.H