Let's say I've got the following table:
ProdID Date Val1 Val2 Val3
Prod1 4/1/2019 1 3 4
Prod1 4/3/2019 2 3 54
Prod1 4/4/2019 3 4 54
Prod2 4/1/2019 1 3 3
Prod2 4/2/2019 1 3 4
Prod2 4/3/2019 2 4 4
Prod2 4/4/2019 2 5 3
Prod2
entries are populated correctly as we've got the data from 4/1/2019
to 4/4/2019
.
Prod1
has 1 missing date - 4/2/2019
.
I would like to find missing dates for all ProdIDs and fill in Val1-3 with data copied from the last of previous entry. For instance, I would like to copy data from 4/1/2019
to 4/2/2019
ProdID Date Val1 Val2 Val3
Prod1 4/1/2019 1 3 4
Prod1 4/2/2019 1 3 4
Prod1 4/3/2019 2 3 54
Prod1 4/4/2019 3 4 54
Prod2 4/1/2019 1 3 3
Prod2 4/2/2019 1 3 4
Prod2 4/3/2019 2 4 4
Prod2 4/4/2019 2 5 3
To add missing dates to Python Pandas DataFrame, we can use the DatetimeIndex instance's reindex method. We create a date range index with idx = pd. date_range('09-01-2020', '09-30-2020') . printed.
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
The method argument of fillna() can be used to replace missing values with previous/next valid values. If method is set to 'ffill' or 'pad' , missing values are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill' , replaced with the next valid values (= backward fill).
First convert column to datetime
s by to_datetime
, then create DatetimeIndex
by DataFrame.set_index
and call GroupBy.apply
with DataFrame.asfreq
- there is also possible specify method for forward or back filling missing values:
df['Date'] = pd.to_datetime(df['Date'])
df1 = (df.set_index('Date')
.groupby('ProdID')
.apply(lambda x: x.asfreq('D', method='ffill'))
.reset_index(level=0, drop=True)
.reset_index()
.reindex(df.columns, axis=1))
print (df1)
ProdID Date Val1 Val2 Val3
0 Prod1 2019-04-01 1 3 4
1 Prod1 2019-04-02 1 3 4
2 Prod1 2019-04-03 2 3 54
3 Prod1 2019-04-04 3 4 54
4 Prod2 2019-04-01 1 3 3
5 Prod2 2019-04-02 1 3 4
6 Prod2 2019-04-03 2 4 4
7 Prod2 2019-04-04 2 5 3
Another solution is create all combinations of product and datetimes
by product
and DataFrame.merge
with left join, last forward filling missing values by ffill
:
dates = pd.date_range(start=df['Date'].min(), end=df['Date'].max())
prods = df.ProdID.unique()
from itertools import product
df1 = pd.DataFrame(list(product(prods, dates)), columns=['ProdID', 'Date'])
print (df1)
ProdID Date
0 Prod1 2019-04-01
1 Prod1 2019-04-02
2 Prod1 2019-04-03
3 Prod1 2019-04-04
4 Prod2 2019-04-01
5 Prod2 2019-04-02
6 Prod2 2019-04-03
7 Prod2 2019-04-04
df = df1.merge(df, how='left').ffill()
print (df)
ProdID Date Val1 Val2 Val3
0 Prod1 2019-04-01 1.0 3.0 4.0
1 Prod1 2019-04-02 1.0 3.0 4.0
2 Prod1 2019-04-03 2.0 3.0 54.0
3 Prod1 2019-04-04 3.0 4.0 54.0
4 Prod2 2019-04-01 1.0 3.0 3.0
5 Prod2 2019-04-02 1.0 3.0 4.0
6 Prod2 2019-04-03 2.0 4.0 4.0
7 Prod2 2019-04-04 2.0 5.0 3.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With