pandas fill missing dates in time series

Tags:

python

pandas

I have a dataframe which has aggregated data for some days. I want to add in the missing days

I was following another post, Add missing dates to pandas dataframe, unfortunately, it overwrote my results (maybe functionality was changed slightly?)... the code is below

import random import datetime as dt import numpy as np import pandas as pd  def generate_row(year, month, day):     while True:         date = dt.datetime(year=year, month=month, day=day)         data = np.random.random(size=4)         yield [date] + list(data)  # days I have data for dates = [(2000, 1, 1), (2000, 1, 2), (2000, 2, 4)] generators = [generate_row(*date) for date in dates]  # get 5 data points for each data = [next(generator) for generator in generators for _ in range(5)]  df = pd.DataFrame(data, columns=['date'] + ['f'+str(i) for i in range(1,5)])  # df groupby_day = df.groupby(pd.PeriodIndex(data=df.date, freq='D')) results = groupby_day.sum()  idx = pd.date_range(min(df.date), max(df.date)) results.reindex(idx, fill_value=0)

Results before filling in missing date indices
enter image description here

Results after
enter image description here

936

asked Nov 10 '17 21:11

Alter

1 Answers

You need to use period_range rather than date_range:

In [11]: idx = pd.period_range(min(df.date), max(df.date))     ...: results.reindex(idx, fill_value=0)     ...: Out[11]:                   f1        f2        f3        f4 2000-01-01  2.049157  1.962635  2.756154  2.224751 2000-01-02  2.675899  2.587217  1.540823  1.606150 2000-01-03  0.000000  0.000000  0.000000  0.000000 2000-01-04  0.000000  0.000000  0.000000  0.000000 2000-01-05  0.000000  0.000000  0.000000  0.000000 2000-01-06  0.000000  0.000000  0.000000  0.000000 2000-01-07  0.000000  0.000000  0.000000  0.000000 2000-01-08  0.000000  0.000000  0.000000  0.000000 2000-01-09  0.000000  0.000000  0.000000  0.000000 2000-01-10  0.000000  0.000000  0.000000  0.000000 2000-01-11  0.000000  0.000000  0.000000  0.000000 2000-01-12  0.000000  0.000000  0.000000  0.000000 2000-01-13  0.000000  0.000000  0.000000  0.000000 2000-01-14  0.000000  0.000000  0.000000  0.000000 2000-01-15  0.000000  0.000000  0.000000  0.000000 2000-01-16  0.000000  0.000000  0.000000  0.000000 2000-01-17  0.000000  0.000000  0.000000  0.000000 2000-01-18  0.000000  0.000000  0.000000  0.000000 2000-01-19  0.000000  0.000000  0.000000  0.000000 2000-01-20  0.000000  0.000000  0.000000  0.000000 2000-01-21  0.000000  0.000000  0.000000  0.000000 2000-01-22  0.000000  0.000000  0.000000  0.000000 2000-01-23  0.000000  0.000000  0.000000  0.000000 2000-01-24  0.000000  0.000000  0.000000  0.000000 2000-01-25  0.000000  0.000000  0.000000  0.000000 2000-01-26  0.000000  0.000000  0.000000  0.000000 2000-01-27  0.000000  0.000000  0.000000  0.000000 2000-01-28  0.000000  0.000000  0.000000  0.000000 2000-01-29  0.000000  0.000000  0.000000  0.000000 2000-01-30  0.000000  0.000000  0.000000  0.000000 2000-01-31  0.000000  0.000000  0.000000  0.000000 2000-02-01  0.000000  0.000000  0.000000  0.000000 2000-02-02  0.000000  0.000000  0.000000  0.000000 2000-02-03  0.000000  0.000000  0.000000  0.000000 2000-02-04  1.856158  2.892620  2.986166  2.793448

This is because your groupby uses PeriodIndex, rather than datetime:

df.groupby(pd.PeriodIndex(data=df.date, freq='D'))

You could have instead used a pd.Grouper:

df.groupby(pd.Grouper(key="date", freq='D'))

which would have give a datetime index.

147

answered Sep 20 '22 13:09

Andy Hayden

Related questions
                            
                                Is it possible to do multivariate multi-step forecasting using FB Prophet?
                            
                                Python ctypes: loading DLL from from a relative path
                            
                                python - call instance method using __func__
                            
                                Why should Py_INCREF(Py_None) be required before returning Py_None in C?
                            
                                SyntaxError: invalid token in datetime.datetime(2012,05,22,09,03,41)?
                            
                                Storing a list of strings to a HDF5 Dataset from Python
                            
                                How to write multiple conditions of if-statement in Robot Framework
                            
                                Can I perform multiple assertions in pytest?
                            
                                How to slice a Pandas Dataframe based on datetime index
                            
                                What are differences between List, Dictionary and Tuple in Python? [duplicate]
                            
                                Mechanize and Javascript
                            
                                Reading a Line From File Without Advancing [Pythonic Approach]
                            
                                swift if or/and statement like python
                            
                                Pythonic way to merge two overlapping lists, preserving order
                            
                                How to create a pandas DatetimeIndex with year as frequency?
                            
                                numpy with python: convert 3d array to 2d
                            
                                Is there easy way to grid search without cross validation in python?
                            
                                Override save method of Django Admin
                            
                                Trim specific leading and trailing characters from a string
                            
                                how to close Python selenium webdriver window

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With