How to keep the only the top N values in a dataframe

Question

It is pandas/Dataframe, for every row, I want to keep only the top N (N=3) values and set others to nan,

import pandas as pd
import numpy as np

data = np.array([['','day1','day2','day3','day4','day5'],
                ['larry',1,4,4,3,5],
                ['gunnar',2,-1,3,4,4],
                ['tin',-2,5,5, 6,7]])
                
df = pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:])
print(df)

output is

       day1 day2 day3 day4 day5
larry     1    4    4    3    5
gunnar    2   -1    3    4    4
tin      -2    5    5    6    7

I want to get

       day1 day2 day3 day4 day5
larry   NaN    4    4  NaN    5
gunnar  NaN  NaN    3    4    4
tin     NaN    5  NaN    6    7

Similar to pandas: Keep only top n values and set others to 0, but I need to keep only N highest available values, otherwise the average is not correct

For the result above I want to keep first 5 only

Quang Hoang · Accepted Answer

You can use np.unique to sort and find the 5th largest value, and use where:

uniques = np.unique(df)

# what happens if len(uniques) < 5?
thresh = uniques[-5]
df.where(df >= thresh)

Output:

        day1  day2  day3  day4  day5
larry    NaN   4.0     4     3     5
gunnar   NaN   NaN     3     4     4
tin      NaN   5.0     5     6     7

Update: On the second look, I think you can do:

df.apply(pd.Series.nlargest, n=3,axis=1).reindex(df.columns, axis=1)

Output:

        day1  day2  day3  day4  day5
larry    NaN   4.0   4.0   NaN   5.0
gunnar   NaN   NaN   3.0   4.0   4.0
tin      NaN   5.0   NaN   6.0   7.0

How to keep the only the top N values in a dataframe

Tags:

python

pandas

dataframe

numpy

Larry Cai

1 Answers

Quang Hoang

Recent Activity

Donate For Us

How to keep the only the top N values in a dataframe

Tags:

python

pandas

dataframe

numpy

Larry Cai

1 Answers

Quang Hoang

Related questions

Recent Activity

Donate For Us