I have a data frame df which looks like this. Date and Time are 2 multilevel index
observation1 observation2 date Time 2012-11-02 9:15:00 79.373668 224 9:16:00 130.841316 477 2012-11-03 9:15:00 45.312814 835 9:16:00 123.776946 623 9:17:00 153.76646 624 9:18:00 463.276946 626 9:19:00 663.176934 622 9:20:00 763.77333 621 2012-11-04 9:15:00 115.449437 122 9:16:00 123.776946 555 9:17:00 153.76646 344 9:18:00 463.276946 212
I want to run some complex process over daily data block.
Pseudo code would look like
for count in df(level 0 index) : new_df = get only chunk for count complex_process(new_df)
So, first of all, I could not find a way to access only blocks for a date
2012-11-03 9:15:00 45.312814 835 9:16:00 123.776946 623 9:17:00 153.76646 624 9:18:00 463.276946 626 9:19:00 663.176934 622 9:20:00 763.77333 621
and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and able to get df.index.get_level_values(0), but it returns me all the values and that causes loop to run multiple times for a day. I want to create a dataframe per day and send it for processing.
pandas DataFrame. iterrows() is used to iterate over DataFrame rows. This returns (index, Series) where the index is an index of the Row and Series is data or content of each row. To get the data from the series, you should use the column name like row["Fee"] .
Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.
Output: Now, the dataframe has Hierarchical Indexing or multi-indexing. To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.
One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.
In [136]: for date, new_df in df.groupby(level=0): ...: print(new_df) ...: observation1 observation2 date Time 2012-11-02 9:15:00 79.373668 224 9:16:00 130.841316 477 observation1 observation2 date Time 2012-11-03 9:15:00 45.312814 835 9:16:00 123.776946 623 9:17:00 153.766460 624 9:18:00 463.276946 626 9:19:00 663.176934 622 9:20:00 763.773330 621 observation1 observation2 date Time 2012-11-04 9:15:00 115.449437 122 9:16:00 123.776946 555 9:17:00 153.766460 344 9:18:00 463.276946 212
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With