Add Missing Date Index in a multiindex dataframe

Tags:

I am working with a multi index data frame that has a date column and location_id as indices.

index_1 = ['2020-01-01', '2020-01-03', '2020-01-04']
index_2 = [100,200,300]

index = pd.MultiIndex.from_product([index_1, 
index_2], names=['Date', 'location_id'])

df = pd.DataFrame(np.random.randint(10,100,9), index)
df

                         0
Date       location_id    
2020-01-01 100          19
           200          75
           300          39
2020-01-03 100          11
           200          91
           300          80
2020-01-04 100          36
           200          56
           300          54

I want to fill in missing dates, with just one location_id and fill it with 0:

                         0
Date       location_id    
2020-01-01 100          19
           200          75
           300          39
2020-01-02 100          0
2020-01-03 100          11
           200          91
           300          80
2020-01-04 100          36
           200          56
           300          54

How can I achieve that? This is helpful but only if my data frame was not multi indexed.

782

asked Jun 17 '20 19:06

Rob

2 Answers

you can get unique value of the Date index level, generate all dates between min and max with pd.date_range and use difference with unique value of Date to get the missing one. Then reindex df with the union of the original index and a MultiIndex.from_product made of missing date and the min of the level location_id.

#unique dates
m = df.index.unique(level=0)
# reindex
df = df.reindex(df.index.union(
                   pd.MultiIndex.from_product([pd.date_range(m.min(), m.max())
                                                .difference(pd.to_datetime(m))
                                                .strftime('%Y-%m-%d'), 
                                             [df.index.get_level_values(1).min()]])), 
                fill_value=0)
print(df)
                 0
2020-01-01 100  91
           200  49
           300  19
2020-01-02 100   0
2020-01-03 100  41
           200  25
           300  51
2020-01-04 100  44
           200  40
           300  54

instead of pd.MultiIndex.from_product, you can also use product from itertools. Same result but maybe faster.

from itertools import product
df = df.reindex(df.index.union(
                  list(product(pd.date_range(m.min(), m.max())
                                 .difference(pd.to_datetime(m))
                                 .strftime('%Y-%m-%d'),
                               [df.index.get_level_values(1).min()]))),
                fill_value=0)

142

answered Sep 25 '22 01:09

Ben.T

Pandas index is immutable, so you need to construct a new index. Put index level location_id to column and get unique rows and call asfreq to create rows for missing date. Assign the result to df2. Finally, use df.align to join both indices and fillna

df1 = df.reset_index(-1)
df2 = df1.loc[~df1.index.duplicated()].asfreq('D').ffill()
df_final = df.align(df2.set_index('location_id', append=True))[0].fillna(0)

Out[75]:
                           0
Date       location_id
2020-01-01 100          19.0
           200          75.0
           300          39.0
2020-01-02 100           0.0
2020-01-03 100          11.0
           200          91.0
           300          80.0
2020-01-04 100          36.0
           200          56.0
           300          54.0

answered Sep 23 '22 01:09

Andy L.

Related questions
                            
                                Name or service not known when running a Dash app
                            
                                Telegram bot returning null
                            
                                Docker container run locally - didn't send any data
                            
                                Mongoose Schemas and inserting via python
                            
                                How to convert a dictionary to dataframe in PySpark?
                            
                                Get overview of SQL query count for every test in a test suite
                            
                                Unable to update PyTorch 1.4.0 to 1.5.0 using Conda
                            
                                Difference between virtualGraph and pipelineStage Graphcore's PopART/Poplar libraries
                            
                                How to serve a Flutter web app with Django?
                            
                                Django SECRET_KEY setting must not be empty with github workflow
                            
                                why pandas.DataFrame.sum(axis=0) returns sum of values in each column where axis =0 represent rows?
                            
                                Combine GridSearchCV and StackingClassifier
                            
                                Find longest adjacent repeating non-overlapping substring
                            
                                Installing pyttsx3 on Linux Mint
                            
                                How to template match a simple 2D shape in OpenCV?
                            
                                Printing an unzipped list object returns empty list [duplicate]
                            
                                FastAPI Single Parameter Body cause Pydantic Validation Error
                            
                                Python's new `functools.cached_property` bug or limitation?
                            
                                List of tuples to dictionary with duplicates keys via list comprehension?
                            
                                Minimal set of files required to distribute an embed-Cython-compiled code and make it work on any machine

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Add Missing Date Index in a multiindex dataframe

Tags:

python

pandas

dataframe

Rob

People also ask

2 Answers

Ben.T

Andy L.

Recent Activity

Donate For Us