Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate over pandas multiindex dataframe using index

Tags:

python

pandas

I have a data frame df which looks like this. Date and Time are 2 multilevel index

                           observation1   observation2 date          Time                              2012-11-02    9:15:00      79.373668      224               9:16:00      130.841316     477 2012-11-03    9:15:00      45.312814      835               9:16:00      123.776946     623               9:17:00      153.76646      624               9:18:00      463.276946     626               9:19:00      663.176934     622               9:20:00      763.77333      621 2012-11-04    9:15:00      115.449437     122               9:16:00      123.776946     555               9:17:00      153.76646      344               9:18:00      463.276946     212 

I want to run some complex process over daily data block.

Pseudo code would look like

 for count in df(level 0 index) :      new_df = get only chunk for count      complex_process(new_df) 

So, first of all, I could not find a way to access only blocks for a date

2012-11-03    9:15:00      45.312814      835               9:16:00      123.776946     623               9:17:00      153.76646      624               9:18:00      463.276946     626               9:19:00      663.176934     622               9:20:00      763.77333      621 

and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and able to get df.index.get_level_values(0), but it returns me all the values and that causes loop to run multiple times for a day. I want to create a dataframe per day and send it for processing.

like image 603
Yantraguru Avatar asked Sep 19 '14 08:09

Yantraguru


People also ask

How do I iterate through pandas index?

pandas DataFrame. iterrows() is used to iterate over DataFrame rows. This returns (index, Series) where the index is an index of the Row and Series is data or content of each row. To get the data from the series, you should use the column name like row["Fee"] .

What is the fastest way to iterate over pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

How do I convert MultiIndex to single index in pandas?

Output: Now, the dataframe has Hierarchical Indexing or multi-indexing. To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.


1 Answers

One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.

In [136]: for date, new_df in df.groupby(level=0):      ...:     print(new_df)      ...:                          observation1  observation2 date       Time                                2012-11-02 9:15:00     79.373668           224            9:16:00    130.841316           477                      observation1  observation2 date       Time                                2012-11-03 9:15:00     45.312814           835            9:16:00    123.776946           623            9:17:00    153.766460           624            9:18:00    463.276946           626            9:19:00    663.176934           622            9:20:00    763.773330           621                      observation1  observation2 date       Time                                2012-11-04 9:15:00    115.449437           122            9:16:00    123.776946           555            9:17:00    153.766460           344            9:18:00    463.276946           212 
like image 188
chrisb Avatar answered Oct 14 '22 18:10

chrisb