Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping over a MultiIndex in pandas

I have a MultiIndexed DataFrame df1, and would like to loop over it in such a way as to in each instance of the loop have a DataFrame with a regular non-hierarchical index which is the subset of df1 corresponding to the outer index entries. I.e., if i have:

FirstTable

I want to get

SecondTable

and subsequently C1, C2, etc. I also don't know what the names of these will actually be (C1, etc., just being placeholders here), so would just like to loop over the number of Ci values I have.

I have been stumbling around with iterrows and various loops and not getting any tangible results and don't really know how to proceed. I feel like a simple solution should exist but couldn't find anything that looked helpful in the documentation, probably due to my own lack of understanding.

like image 351
Charles Dillon Avatar asked Jul 17 '14 15:07

Charles Dillon


People also ask

Can DataFrame have multiple indexes?

You can also construct a MultiIndex from a DataFrame directly, using the method MultiIndex.

How do I reset MultiIndex pandas?

A multi-index dataframe has multi-level, or hierarchical indexing. We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.

What is a MultiIndex in pandas?

The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.


2 Answers

Using a modified example from here

In [30]: def mklbl(prefix,n):
        return ["%s%s" % (prefix,i)  for i in range(n)]
   ....: 

In [31]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
                                  ('b','foo'),('b','bah')],
                                   names=['lvl0', 'lvl1'])

In [33]: index = MultiIndex.from_product([mklbl('A',4),mklbl('B',2)])

In [34]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
               index=index,
               columns=columns).sortlevel().sortlevel(axis=1)

In [35]: df
Out[35]: 
lvl0     a         b     
lvl1   bar  foo  bah  foo
A0 B0    1    0    3    2
   B1    5    4    7    6
A1 B0    9    8   11   10
   B1   13   12   15   14
A2 B0   17   16   19   18
   B1   21   20   23   22
A3 B0   25   24   27   26
   B1   29   28   31   30

In [36]: df.loc['A0']
Out[36]: 
lvl0    a         b     
lvl1  bar  foo  bah  foo
B0      1    0    3    2
B1      5    4    7    6

In [37]: df.loc['A1']
Out[37]: 
lvl0    a         b     
lvl1  bar  foo  bah  foo
B0      9    8   11   10
B1     13   12   15   14

No looping is necessary.

You can also select these in order to return a frame (with the original MI) e.g. df.loc[['A1']]

If you want to get the values in the index:

In [38]: df.index.get_level_values(0).unique()
Out[38]: array(['A0', 'A1', 'A2', 'A3'], dtype=object)
like image 174
Jeff Avatar answered Sep 17 '22 09:09

Jeff


Are you trying to do something like this?

for i in set(df.index):
    print df.loc[i].reset_index()
  1. set(df.index) returns a set of unique tuples of your multi-index (hierarchical index).
  2. df.loc[i].reset_index()... df.loc[i] of course returns a subset of your original dataframe, and the .reset_index() part will convert the index to columns
like image 36
John Avatar answered Sep 20 '22 09:09

John