Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group a multi-indexed pandas dataframe by one of its levels?

Tags:

Is it possible to groupby a multi-index (2 levels) pandas dataframe by one of the multi-index levels ?

The only way I know of doing it is to reset_index on a multiindex and then set index again. I am sure there is a better way to do it, and I want to know how.

like image 907
silencer Avatar asked Sep 08 '13 22:09

silencer


People also ask

Can you Groupby index in pandas?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

Can a pandas DataFrame have more than one index?

What if you could have more than one column as in your DataFrame's index? The multi-level index feature in Pandas allows you to do just that. A regular Pandas DataFrame has a single column that acts as a unique row identifier, or in other words, an “index”. These index values can be numbers, from 0 to infinity.

How do you get index after Groupby pandas?

Python's groupby() function is versatile. It is used to split the data into groups based on some criteria like mean, median, value_counts, etc. In order to reset the index after groupby() we will use the reset_index() function.


2 Answers

Yes, use the level parameter. Take a look here. Example:

In [26]: s  first  second  third bar    doo     one      0.404705                two      0.577046 baz    bee     one     -1.715002                two     -1.039268 foo    bop     one     -0.370647                two     -1.157892 qux    bop     one     -1.344312                two      0.844885 dtype: float64  In [27]: s.groupby(level=['first','second']).sum()  first  second bar    doo       0.981751 baz    bee      -2.754270 foo    bop      -1.528539 qux    bop      -0.499427 dtype: float64 
like image 104
elyase Avatar answered Sep 18 '22 06:09

elyase


In recent versions of pandas, you can group by multi-index level names similar to columns (i.e. without the level keyword), allowing you to use both simultaneously.

>>> import pandas as pd >>> pd.__version__ '1.0.5' >>> df = pd.DataFrame({ ...     'first': ['a', 'a', 'a', 'b', 'b', 'b'], ...     'second': ['x', 'y', 'x', 'z', 'y', 'z'], ...     'column': ['k', 'k', 'l', 'l', 'm', 'n'], ...     'data': [0, 1, 2, 3, 4, 5], ... }).set_index(['first', 'second']) >>> df.groupby('first').sum()        data first       a         3 b        12 >>> df.groupby(['second', 'column']).sum()                data second column       x      k          0        l          2 y      k          1        m          4 z      l          3        n          5 

The column and index level names you groupby must be unique. If you have a column and index level with the same name, you will get a ValueError when trying to groupby.

like image 35
HoosierDaddy Avatar answered Sep 22 '22 06:09

HoosierDaddy