Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add Missing Date Index in a multiindex dataframe

I am working with a multi index data frame that has a date column and location_id as indices.

index_1 = ['2020-01-01', '2020-01-03', '2020-01-04']
index_2 = [100,200,300]

index = pd.MultiIndex.from_product([index_1, 
index_2], names=['Date', 'location_id'])

df = pd.DataFrame(np.random.randint(10,100,9), index)
df

                         0
Date       location_id    
2020-01-01 100          19
           200          75
           300          39
2020-01-03 100          11
           200          91
           300          80
2020-01-04 100          36
           200          56
           300          54

I want to fill in missing dates, with just one location_id and fill it with 0:

                         0
Date       location_id    
2020-01-01 100          19
           200          75
           300          39
2020-01-02 100          0
2020-01-03 100          11
           200          91
           300          80
2020-01-04 100          36
           200          56
           300          54

How can I achieve that? This is helpful but only if my data frame was not multi indexed.

like image 782
Rob Avatar asked Jun 17 '20 19:06

Rob


People also ask

How do I add a missing date to a DataFrame?

To add missing dates to Python Pandas DataFrame, we can use the DatetimeIndex instance's reindex method. We create a date range index with idx = pd. date_range('09-01-2020', '09-30-2020') .

How do I convert MultiIndex to single index in pandas?

To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.


2 Answers

you can get unique value of the Date index level, generate all dates between min and max with pd.date_range and use difference with unique value of Date to get the missing one. Then reindex df with the union of the original index and a MultiIndex.from_product made of missing date and the min of the level location_id.

#unique dates
m = df.index.unique(level=0)
# reindex
df = df.reindex(df.index.union(
                   pd.MultiIndex.from_product([pd.date_range(m.min(), m.max())
                                                .difference(pd.to_datetime(m))
                                                .strftime('%Y-%m-%d'), 
                                             [df.index.get_level_values(1).min()]])), 
                fill_value=0)
print(df)
                 0
2020-01-01 100  91
           200  49
           300  19
2020-01-02 100   0
2020-01-03 100  41
           200  25
           300  51
2020-01-04 100  44
           200  40
           300  54

instead of pd.MultiIndex.from_product, you can also use product from itertools. Same result but maybe faster.

from itertools import product
df = df.reindex(df.index.union(
                  list(product(pd.date_range(m.min(), m.max())
                                 .difference(pd.to_datetime(m))
                                 .strftime('%Y-%m-%d'),
                               [df.index.get_level_values(1).min()]))),
                fill_value=0)
like image 142
Ben.T Avatar answered Sep 25 '22 01:09

Ben.T


Pandas index is immutable, so you need to construct a new index. Put index level location_id to column and get unique rows and call asfreq to create rows for missing date. Assign the result to df2. Finally, use df.align to join both indices and fillna

df1 = df.reset_index(-1)
df2 = df1.loc[~df1.index.duplicated()].asfreq('D').ffill()
df_final = df.align(df2.set_index('location_id', append=True))[0].fillna(0)

Out[75]:
                           0
Date       location_id
2020-01-01 100          19.0
           200          75.0
           300          39.0
2020-01-02 100           0.0
2020-01-03 100          11.0
           200          91.0
           300          80.0
2020-01-04 100          36.0
           200          56.0
           300          54.0
like image 25
Andy L. Avatar answered Sep 23 '22 01:09

Andy L.