Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hierarhical Multi-index counts in Pandas

Tags:

Say I have a multi-index dataframe in Pandas, e.g:

                         A         B         C X      Y     Z                                 bar   one    a   -0.007381 -0.365315 -0.024817              b   -1.219794  0.370955 -0.795125 baz   three  a    0.145578  1.428502 -0.408384              b   -0.249321 -0.292967 -1.849202       two    a   -0.249321 -0.292967 -1.849202       four   a    0.211234 -0.967123  1.202234 foo   one    b   -1.046479 -1.250595  0.781722              a    1.314373  0.333150  0.133331 qux   one    c    0.716789  0.616471 -0.298493       two    b    0.385795 -0.915417 -1.367644 

How can I count how many levels are contained within another level? (e.g. level Y within X)

E.g. in the case above the answer would be:

X    Y  bar  1 baz  3 foo  1 qux  2 

Update

When I try df.groupby(level=[0, 1]).count()[0] I get:

            C  D  E A    B              bar  one    1  1  1      three  1  1  1 flux six    1  1  1      three  1  1  1 foo  five   1  1  1      one    1  1  1      two    2  2  2 
like image 585
Amelio Vazquez-Reina Avatar asked Aug 04 '14 20:08

Amelio Vazquez-Reina


People also ask

How does pandas handle multiple index columns?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero. Yields below output.

What is Panda hierarchical index?

Hierarchical indexing is one of the functions in pandas, a software library for the Python programming languages. pandas derives its name from the term “panel data”, a statistical term for four-dimensional data models that show changes over time.

Can a DataFrame have 2 indexes?

In this example, we will be creating multi-index from dataframe using pandas. We will be creating manual data and then using pd. dataframe, we will create a dataframe with the set of data. Now using the Multi-index syntax we will create a multi-index with a dataframe.


1 Answers

You can do the following (group by level X and then calculate the number of unique values of Y in each group, which is easier when the index is reset):

In [15]: df.reset_index().groupby('X')['Y'].nunique() Out[15]:  X bar    1 baz    3 foo    1 qux    2 Name: Y, dtype: int64 
like image 72
joris Avatar answered Sep 20 '22 17:09

joris