Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find top N minimum values from the DataFrame, Python-3

I have below Dataframe with Field 'Age', Needs find to top 3 minimum age from the DataFrame

DF = pd.DataFrame.from_dict({'Name':['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], 'Age':[18, 45, 35, 70, 23, 24, 50, 65, 18, 23]})

DF['Age'].min()  

Want top two Age i.e 18, 23 in List, How to achieve this?

Note: DataFrame - DF Contains Age Duplicates i.e 18 & 23 repeated twice, need unique values.

like image 572
Learnings Avatar asked Dec 25 '19 11:12

Learnings


People also ask

How do you find the minimum value of a DataFrame in Python?

Pandas DataFrame min() Method The min() method returns a Series with the minimum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the minimum value for each row.

How do you get top 5 from a data frame?

to get the top 5 most occuring values use df['column']. value_counts(). head(n) and the solution provided by @lux7 df['column'].


2 Answers

You can make use of nsmallest(..) [pandas-doc]:

df.nsmallest(2, 'Age')

For the given sample data, this gives us:

>>> df.nsmallest(2, 'Age')
  Name  Age
0    A   18
4    E   23

Or if you only need the value of the Age column:

>>> df['Age'].nsmallest(2)
0    18
4    23
Name: Age, dtype: int64

or you can wrap it in a list:

>>> df['Age'].nsmallest(2).to_list()
[18, 23]

You can obtain the n smallest unique values, by first constructing a Series with unique values:

>>> pd.Series(df['Age'].unique()).nsmallest(2)
0    18
4    23
dtype: int64
>>> df['Age'].drop_duplicates().nsmallest(2)
0    18
4    23
Name: Age, dtype: int64
like image 125
Willem Van Onsem Avatar answered Oct 20 '22 20:10

Willem Van Onsem


The right thing is to use nsmallest, here I show another way: DataFrame.sort_values + DataFrame.head

df['Age'].sort_values().head(2).tolist()
#[18, 23]

UPDATED

If there are duplicates, we could use Series.drop_duplicates previously:

df['Age'].drop_duplicates().nsmallest(2).tolist()
#df['Age'].drop_duplicates().sort_values().head(2).tolist()
#[18, 23]

or np.sort + np.unique

[*np.sort(df['Age'].unique())[:2]]
#[18, 23]
like image 41
ansev Avatar answered Oct 20 '22 20:10

ansev