I often have MultiIndex indices and I'd like to iterate over groups where higher level indices are equal. It basically looks like
from random import choice
import pandas as pd
N = 100
df = pd.DataFrame([choice([1, 2, 3]) for _ in range(N)],
columns=["A"],
index=pd.MultiIndex.from_tuples([(choice("ab"), choice("cd"), choice("de"))
for _ in range(N)]))
for idx in zip(df.index.get_level_values(0), df.index.get_level_values(1)):
df_select = df.ix[idx]
Is there a way to do the for loop iteration more neatly?
Using DataFrame.iterrows() is used to iterate over DataFrame rows. This returns (index, Series) where the index is an index of the Row and Series is data or content of each row. To get the data from the series, you should use the column name like row["Fee"] .
To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.
Use groupby
. The index of the df_select
view includes the first two level values, but otherwise is similar to your example.
for idx, df_select in df.groupby(level=[0, 1]):
...
Alternatively to groupby logic you can use a lambda function, which has the advantage of not having to specify the number of levels, i.e. it will pick all levels except the very last one:
for idx in df.index.map(lambda x: x[:-1]):
df_select=df.ix[idx]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With