I have pivoted the Customer ID against their most frequently purchased genres of performances:
Genre            Jazz     Dance     Music  Theatre
Customer                                        
100000000001           0      3         1        2
100000000002           0      1         6        2
100000000003           0      3        13        4
100000000004           0      5         4        1
100000000005           1     10        16       14
My desired result is to append the column names according to the rankings:
Genre            Jazz     Dance     Music  Theatre          Rank1          Rank2          Rank3          Rank4
Customer                                         
100000000001           0      3         1        2          Dance        Theatre          Music           Jazz
100000000002           0      1         6        2          Music        Theatre          Dance           Jazz
100000000003           0      3        13        4          Music        Theatre          Dance           Jazz
100000000004           0      5         4        1          Dance          Music        Theatre           Jazz
100000000005           1     10        16       14          Music        Theatre          Dance           Jazz
I have looked up some threads but the closest thing I can find is idxmax. However that only gives me Rank1.
Could anyone help me to get the result I need?
Thanks a lot!
Dennis
You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.
Pandas DataFrame: rank() function The rank() function is used to compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values. Index to direct ranking.
rank() method returns a rank of every respective index of a series passed. The rank is returned on the basis of position after sorting. Parameters: axis: 0 or 'index' for rows and 1 or 'columns' for Column.
Use:
i = np.argsort(df.to_numpy() * -1, axis=1)
r = pd.DataFrame(df.columns[i], index=df.index, columns=range(1, i.shape[1] + 1)) 
df = df.join(r.add_prefix('Rank'))
Details:
Use np.argsort along axis=1 to get the indices i that would sort the genres in descending order.
print(i)
array([[1, 3, 2, 0],
       [2, 3, 1, 0],
       [2, 3, 1, 0],
       [1, 2, 3, 0],
       [2, 3, 1, 0]])
Create a new dataframe r from the columns of dataframe df taken along the indices i (i.e df.columns[i]), then use DataFrame.join to join the dataframe r with df:
print(df)
              Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer                                                                
100000000001     0      3      1        2  Dance  Theatre    Music  Jazz
100000000002     0      1      6        2  Music  Theatre    Dance  Jazz
100000000003     0      3     13        4  Music  Theatre    Dance  Jazz
100000000004     0      5      4        1  Dance    Music  Theatre  Jazz
100000000005     1     10     16       14  Music  Theatre    Dance  Jazz
                        Try this:
dfp = (df.rank(ascending=False, axis=1).stack()
         .astype(int).rename('rank').reset_index(level=1))
df.assign(**dfp.set_index('rank', append=True)['Genre'].unstack().add_prefix('Rank'))
Output:
Genre         Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer                                                                
100000000001     0      3      1        2  Dance  Theatre    Music  Jazz
100000000002     0      1      6        2  Music  Theatre    Dance  Jazz
100000000003     0      3     13        4  Music  Theatre    Dance  Jazz
100000000004     0      5      4        1  Dance    Music  Theatre  Jazz
100000000005     1     10     16       14  Music  Theatre    Dance  Jazz
Use rank and reshape dataframe, then join back to original dataframe using assign.
Lets try stack, cumcount and sort_values
s = df.stack().sort_values(ascending=False).groupby(level=0).cumcount() + 1
s1 = (s.reset_index(1)
    .set_index(0, append=True)
    .unstack(1)
    .add_prefix("Rank")
    
    )
s1.columns = s1.columns.get_level_values(1)
then join back on your customer genre index.
df.join(s1)
                 Jazz  Dance  Music  Theatre  Rank1    Rank2    Rank3 Rank4
Customer_Genre                                                            
100000000001       0      3      1        2  Dance  Theatre    Music  Jazz
100000000002       0      1      6        2  Music  Theatre    Dance  Jazz
100000000003       0      3     13        4  Music  Theatre    Dance  Jazz
100000000004       0      5      4        1  Dance    Music  Theatre  Jazz
100000000005       1     10     16       14  Music  Theatre    Dance  Jazz
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With