I have the following dataframe:
id p1 p2 p3 p4
1 0 9 1 4
2 0 2 3 4
3 1 3 10 7
4 1 5 3 1
5 2 3 7 10
I need to reshape the data frame in a way that for each id it will have the top 3 columns with the highest values. The result would be like this:
id top1 top2 top3
1 p2 p4 p3
2 p4 p3 p2
3 p3 p4 p2
4 p2 p3 p4/p1
5 p4 p3 p2
It shows the top 3 best sellers for every user_id
. I have already done it using the dplyr
package in R, but I am looking for the pandas equivalent.
head(n) to get the first n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the start). By default n = 5, it return first 5 rows if value of n is not passed to the method.
Pandas nlargest function can take more than one variable to order the top rows. We can give a list of variables as input to nlargest and get first n rows ordered by the list of columns in descending order. Here we get top 3 rows with largest values in column “lifeExp” and then “gdpPercap”.
pandas DataFrame. head() method is used to get the top or bottom N rows of the DataFrame.
Pandas DataFrame max() Method The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.
You could use np.argsort
to find the indices of the n largest items for each row:
import numpy as np
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3, 4, 5],
'p1': [0, 0, 1, 1, 2],
'p2': [9, 2, 3, 5, 3],
'p3': [1, 3, 10, 3, 7],
'p4': [4, 4, 7, 1, 10]})
df = df.set_index('id')
nlargest = 3
order = np.argsort(-df.values, axis=1)[:, :nlargest]
result = pd.DataFrame(df.columns[order],
columns=['top{}'.format(i) for i in range(1, nlargest+1)],
index=df.index)
print(result)
yields
top1 top2 top3
id
1 p2 p4 p3
2 p4 p3 p2
3 p3 p4 p2
4 p2 p3 p1
5 p4 p3 p2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With