I want to find the highest 3 values of each column in a dataframe, and return the index names, ordered by value. The dataframe looks like this:
df = pd.DataFrame({"u1":[1,2,-3,4,5],
"u2":[8,-4,5,6,7],
"u3":[np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]},
index=["q1","q2","q3","q4","q5"])
The result would look like this:
u1 u2 u3
q5 q1 NaN
q4 q5 NaN
q2 q4 NaN
You can count the number of duplicate rows by counting True in pandas. Series obtained with duplicated() . The number of True can be counted with sum() method. If you want to count the number of False (= the number of non-duplicate rows), you can invert it with negation ~ and then count True with sum() .
DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
Select first N Rows from a Dataframe using head() function In Python's Pandas module, the Dataframe class provides a head() function to fetch top rows from a Dataframe i.e. It returns the first n rows from a dataframe. If n is not provided then default value is 5.
You can use apply
with pandas.Series.nlargest
function.
df.apply(lambda x: pd.Series(x.nlargest(3).index))
u1 u2 u3
0 q5 q1 NaN
1 q4 q5 NaN
2 q2 q4 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With