Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding highest n values of every column in dataframe [duplicate]

I want to find the highest 3 values of each column in a dataframe, and return the index names, ordered by value. The dataframe looks like this:

df = pd.DataFrame({"u1":[1,2,-3,4,5],
                   "u2":[8,-4,5,6,7],
                   "u3":[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]},
                   index=["q1","q2","q3","q4","q5"])

The result would look like this:

u1   u2   u3
q5   q1   NaN
q4   q5   NaN
q2   q4   NaN
like image 613
d_gnz Avatar asked May 20 '20 16:05

d_gnz


People also ask

How do you find the total number of duplicated values in the DataFrame are?

You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .

How do you find duplicates in a DataFrame column?

DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.

How will you find the top 5 records of a DataFrame?

Select first N Rows from a Dataframe using head() function In Python's Pandas module, the Dataframe class provides a head() function to fetch top rows from a Dataframe i.e. It returns the first n rows from a dataframe. If n is not provided then default value is 5.


1 Answers

You can use apply with pandas.Series.nlargest function.

df.apply(lambda x: pd.Series(x.nlargest(3).index))
   u1  u2   u3
0  q5  q1  NaN
1  q4  q5  NaN
2  q2  q4  NaN
like image 74
Dishin H Goyani Avatar answered Sep 22 '22 14:09

Dishin H Goyani