Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find names of top-n highest-value columns in each pandas dataframe row

I have the following dataframe:

  id     p1 p2 p3 p4
  1      0  9  1  4
  2      0  2  3  4
  3      1  3 10  7
  4      1  5  3  1
  5      2  3  7 10

I need to reshape the data frame in a way that for each id it will have the top 3 columns with the highest values. The result would be like this:

 id top1 top2 top3
  1  p2   p4   p3
  2  p4   p3   p2
  3  p3   p4   p2
  4  p2   p3   p4/p1
  5  p4   p3   p2

It shows the top 3 best sellers for every user_id. I have already done it using the dplyr package in R, but I am looking for the pandas equivalent.

like image 770
chessosapiens Avatar asked Aug 15 '16 12:08

chessosapiens


People also ask

How will you find the top 5 records of a DataFrame?

head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start). By default n = 5, it return first 5 rows if value of n is not passed to the method.

How do you find the top 3 records in Python?

Pandas nlargest function can take more than one variable to order the top rows. We can give a list of variables as input to nlargest and get first n rows ordered by the list of columns in descending order. Here we get top 3 rows with largest values in column “lifeExp” and then “gdpPercap”.

How will you get the top 2 rows from a DataFrame in pandas?

pandas DataFrame. head() method is used to get the top or bottom N rows of the DataFrame.

How do I see maximum columns in pandas?

Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.


1 Answers

You could use np.argsort to find the indices of the n largest items for each row:

import numpy as np
import pandas as pd

df = pd.DataFrame({'id': [1, 2, 3, 4, 5],
 'p1': [0, 0, 1, 1, 2],
 'p2': [9, 2, 3, 5, 3],
 'p3': [1, 3, 10, 3, 7],
 'p4': [4, 4, 7, 1, 10]})
df = df.set_index('id')

nlargest = 3
order = np.argsort(-df.values, axis=1)[:, :nlargest]
result = pd.DataFrame(df.columns[order], 
                      columns=['top{}'.format(i) for i in range(1, nlargest+1)],
                      index=df.index)

print(result)

yields

   top1 top2 top3
id               
1    p2   p4   p3
2    p4   p3   p2
3    p3   p4   p2
4    p2   p3   p1
5    p4   p3   p2
like image 64
unutbu Avatar answered Oct 06 '22 22:10

unutbu