I have a MultiIndexed DataFrame df1, and would like to loop over it in such a way as to in each instance of the loop have a DataFrame with a regular non-hierarchical index which is the subset of df1 corresponding to the outer index entries. I.e., if i have:
I want to get
and subsequently C1, C2, etc. I also don't know what the names of these will actually be (C1, etc., just being placeholders here), so would just like to loop over the number of Ci values I have.
I have been stumbling around with iterrows
and various loops and not getting any tangible results and don't really know how to proceed. I feel like a simple solution should exist but couldn't find anything that looked helpful in the documentation, probably due to my own lack of understanding.
You can also construct a MultiIndex from a DataFrame directly, using the method MultiIndex.
A multi-index dataframe has multi-level, or hierarchical indexing. We can easily convert the multi-level index into the column by the reset_index() method. DataFrame. reset_index() is used to reset the index to default and make the index a column of the dataframe.
The MultiIndex object is the hierarchical analogue of the standard Index object which typically stores the axis labels in pandas objects. You can think of MultiIndex as an array of tuples where each tuple is unique. A MultiIndex can be created from a list of arrays (using MultiIndex.
Using a modified example from here
In [30]: def mklbl(prefix,n):
return ["%s%s" % (prefix,i) for i in range(n)]
....:
In [31]: columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
('b','foo'),('b','bah')],
names=['lvl0', 'lvl1'])
In [33]: index = MultiIndex.from_product([mklbl('A',4),mklbl('B',2)])
In [34]: df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
index=index,
columns=columns).sortlevel().sortlevel(axis=1)
In [35]: df
Out[35]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 1 0 3 2
B1 5 4 7 6
A1 B0 9 8 11 10
B1 13 12 15 14
A2 B0 17 16 19 18
B1 21 20 23 22
A3 B0 25 24 27 26
B1 29 28 31 30
In [36]: df.loc['A0']
Out[36]:
lvl0 a b
lvl1 bar foo bah foo
B0 1 0 3 2
B1 5 4 7 6
In [37]: df.loc['A1']
Out[37]:
lvl0 a b
lvl1 bar foo bah foo
B0 9 8 11 10
B1 13 12 15 14
No looping is necessary.
You can also select these in order to return a frame (with the original MI)
e.g. df.loc[['A1']]
If you want to get the values in the index:
In [38]: df.index.get_level_values(0).unique()
Out[38]: array(['A0', 'A1', 'A2', 'A3'], dtype=object)
Are you trying to do something like this?
for i in set(df.index):
print df.loc[i].reset_index()
set(df.index)
returns a set of unique tuples of your multi-index (hierarchical index).df.loc[i].reset_index()
... df.loc[i]
of course returns a subset of your original dataframe, and the .reset_index()
part will convert the index to columns If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With