Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get index of the minimum of multi-index Pandas DataFrame using level

I have a Pandas DataFrame that is multiindexed and want to find the minimum value of a certain column in a subset of rows on each level, and get the entire contents of those rows.

import pandas as pd

idx = pd.MultiIndex.from_product([['v1', 'v2'],
                                  ['record' + str(i) for i in range(1, 7)]])

df = pd.DataFrame([[2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140],
                   [2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140]],
                  columns=['col1', 'col2'],
                  index=idx)

My structure:

                 col1  col2
level1 level2
v1     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140
v2     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140

Example desired output I want the minimum value of another column where col1 == 5:

                 col1  col2
level1 level2
v1     record5    5.0   114
v2     record5    5.0   114

I know that I can get a subset of rows by using a comparison statement.

df.ix[df['col1'] == 5]

And I also know that I can get the minimum values of a column within that subset from all levels.

df['col2'][df['col1'] == 5].min(level='level1')

And if I want to specify the level, then I can get the index of 1 row on specific level.

df.ix['v1', pay_up_file.ix['v1']['col2'][(df.ix['v1']['col1'] == 5)].idxmin()]

But I cannot figure out if there is an efficient way to get the indexes from all levels

There does not seem to be a method available along the lines of this:

df['col2'][df['col1'] == 5].idxmin(level='level1')

I can get to what I want with this:

df.ix[
  (df['col1'] == 5) & 
  (df['col2'].isin(df['col2'][df['col1'] == 5].min(level='level1').values))
]

But with everything else that is in Pandas, is there a better way to get to my output?

like image 242
getglad Avatar asked Jun 16 '16 17:06

getglad


People also ask

How do you find the minimum value in pandas?

Pandas DataFrame min() MethodThe min() method returns a Series with the minimum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the minimum value for each row.

How do I get rid of Multi-Level index in pandas?

By using DataFrame. droplevel() or DataFrame. columns. droplevel() you can drop a level from multi-level column index from pandas DataFrame.

How do I get the minimum of two columns in pandas?

Use min() function on a series to find the minimum value in the series. b) Get row index label or position of minimum values among rows and columns : Dataframe. idxmin() : This function returns index of first occurrence of minimum over requested axis.

How to get minimum values from a particular column in pandas Dataframe?

We’ll use ‘Weight’ and ‘Salary’ columns of this data in order to get the index of minimum values from a particular column in Pandas DataFrame. Code #1: Check the index at which minimum weight value is present. We can verify whether the minimum value is present in index or not.

How do I find the minimum value in a Dataframe?

Dataframe.idxmin () : This function returns index of first occurrence of minimum over requested axis. While finding the index of the minimum value across any index, all NA/null values are excluded. Use idxmin () function to find the index/label of the minimum value along the index axis.

What is the smallest number in the column x1 in pandas?

The smallest number in the column x1 is the number 1. The previous example has explained how to get the maxima and minima of a pandas DataFrame column. This example shows how to find the row index positions that correspond to the max and min values.

What is a multi-index Dataframe?

However, things can get really hairy when multi-index dataframes are involved. A multi-index (also known as hierarchical index) dataframe uses more than one column as the index of the dataframe. A multi-index dataframe allows you to store your data in multi-dimension format, and opens up a lot of exciting to represent your data.


2 Answers

This should work:

df.loc[df.loc[df.col1 == 5.].groupby(level=0).col2.idxmin()]

            col1  col2
v1 record5   5.0   114
v2 record5   5.0   114

Note

I'm using idxmin as you thought you ought to. But the context matters. I'm using it following a groupby(level=0).col2.idxmin() which acts as you thought col2.idxmin(level=...) should.

like image 88
piRSquared Avatar answered Dec 04 '22 01:12

piRSquared


>>> (df[df.col1 == 5]
     .groupby(level=0, as_index=False).col2
     .apply(lambda group: group.nsmallest(1))
0  v1  record5    114
1  v2  record5    114
dtype: int64

Or...

>>> df[df.col1 == 5].groupby(level=0).col2.nsmallest(1)
v1  v1  record5    114
v2  v2  record5    114
dtype: int64

But I'm not sure why the first level shows twice (i.e. 'v1' 'v1' ...).

like image 45
Alexander Avatar answered Dec 04 '22 02:12

Alexander