Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby and Multiindex

Tags:

python

pandas

Is there any opportunity in pandas to groupby data by MultiIndex? By this i mean passing to groupby function not only keys but keys and values to predefine dataframe columns?

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'shiny', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)
df = pd.DataFrame([a, b, c]).T
df.columns = ['a', 'b', 'c']
df.groupby(['a', 'b', 'c']).apply(len)

a    b    c    
bar  one  dull     1
     two  dull     1
foo  one  dull     1
          shiny    1
     two  dull     1
          shiny    2

But what I actually want is the following:

mi = pd.MultiIndex(levels=[['foo', 'bar'], ['one', 'two'], ['dull', 'shiny']],
                   labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1]])
#pseudocode
df.groupby(['a', 'b', 'c'], multi_index = mi).apply(len)
a    b    c    
bar  one  dull     1
          shiny    0
     two  dull     1
          shiny    0
foo  one  dull     1
          shiny    1
     two  dull     1
          shiny    2

The way i see it is in creation of additional wrapper on groupby object. Or maybe this feature feets well to pandas philosophy and it can be included in the pandas lib?

like image 236
norecces Avatar asked Jun 10 '13 15:06

norecces


People also ask

What are MultiIndex pandas?

The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.

Can a DataFrame have 2 indexes?

In this article, we will discuss Multi-index for Pandas Dataframe and Groupby operations . Multi-index allows you to select more than one row and column in your index. It is a multi-level or hierarchical object for pandas object. Now there are various methods of multi-index that are used such as MultiIndex.

Can pandas do Groupby index?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

How do you use a Groupby level on pandas?

You can use the following basic syntax to use GroupBy on a pandas DataFrame with a multiindex: #calculate sum by level 0 and 1 of multiindex df. groupby(level=[0,1]). sum() #calculate count by level 0 and 1 of multiindex df.


1 Answers

just reindex and fillna!

In [14]: df.groupby(['a', 'b', 'c']).size().reindex(index=mi).fillna(0)
Out[14]: 
foo  one  dull     1
          shiny    1
     two  dull     1
          shiny    2
bar  one  dull     1
          shiny    0
     two  dull     1
          shiny    0
dtype: float64
like image 60
Jeff Avatar answered Sep 22 '22 03:09

Jeff