Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Filling data for missing dates

Let's say I've got the following table:

ProdID  Date        Val1 Val2 Val3
Prod1   4/1/2019    1    3    4
Prod1   4/3/2019    2    3    54
Prod1   4/4/2019    3    4    54
Prod2   4/1/2019    1    3    3
Prod2   4/2/2019    1    3    4
Prod2   4/3/2019    2    4    4
Prod2   4/4/2019    2    5    3

Prod2 entries are populated correctly as we've got the data from 4/1/2019 to 4/4/2019.

Prod1 has 1 missing date - 4/2/2019.

I would like to find missing dates for all ProdIDs and fill in Val1-3 with data copied from the last of previous entry. For instance, I would like to copy data from 4/1/2019 to 4/2/2019

ProdID  Date        Val1 Val2 Val3
Prod1   4/1/2019    1    3    4
Prod1   4/2/2019    1    3    4
Prod1   4/3/2019    2    3    54
Prod1   4/4/2019    3    4    54
Prod2   4/1/2019    1    3    3
Prod2   4/2/2019    1    3    4
Prod2   4/3/2019    2    4    4
Prod2   4/4/2019    2    5    3
like image 936
MarekK Avatar asked Apr 09 '19 10:04

MarekK


People also ask

How do you fill in missing dates in pandas?

To add missing dates to Python Pandas DataFrame, we can use the DatetimeIndex instance's reindex method. We create a date range index with idx = pd. date_range('09-01-2020', '09-30-2020') . printed.

How can pandas find missing values in dataset?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.

How do I find and replace missing values in pandas?

The method argument of fillna() can be used to replace missing values with previous/next valid values. If method is set to 'ffill' or 'pad' , missing values are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill' , replaced with the next valid values (= backward fill).


1 Answers

First convert column to datetimes by to_datetime, then create DatetimeIndex by DataFrame.set_index and call GroupBy.apply with DataFrame.asfreq - there is also possible specify method for forward or back filling missing values:

df['Date'] = pd.to_datetime(df['Date'])

df1 = (df.set_index('Date')
         .groupby('ProdID')
         .apply(lambda x: x.asfreq('D', method='ffill'))
         .reset_index(level=0, drop=True)
         .reset_index()
         .reindex(df.columns, axis=1))

print (df1)
  ProdID       Date  Val1  Val2  Val3
0  Prod1 2019-04-01     1     3     4
1  Prod1 2019-04-02     1     3     4
2  Prod1 2019-04-03     2     3    54
3  Prod1 2019-04-04     3     4    54
4  Prod2 2019-04-01     1     3     3
5  Prod2 2019-04-02     1     3     4
6  Prod2 2019-04-03     2     4     4
7  Prod2 2019-04-04     2     5     3

Another solution is create all combinations of product and datetimes by product and DataFrame.merge with left join, last forward filling missing values by ffill:

dates = pd.date_range(start=df['Date'].min(), end=df['Date'].max())
prods = df.ProdID.unique()

from  itertools import product
df1 = pd.DataFrame(list(product(prods, dates)), columns=['ProdID', 'Date'])
print (df1)
  ProdID       Date
0  Prod1 2019-04-01
1  Prod1 2019-04-02
2  Prod1 2019-04-03
3  Prod1 2019-04-04
4  Prod2 2019-04-01
5  Prod2 2019-04-02
6  Prod2 2019-04-03
7  Prod2 2019-04-04

df = df1.merge(df, how='left').ffill()
print (df)
  ProdID       Date  Val1  Val2  Val3
0  Prod1 2019-04-01   1.0   3.0   4.0
1  Prod1 2019-04-02   1.0   3.0   4.0
2  Prod1 2019-04-03   2.0   3.0  54.0
3  Prod1 2019-04-04   3.0   4.0  54.0
4  Prod2 2019-04-01   1.0   3.0   3.0
5  Prod2 2019-04-02   1.0   3.0   4.0
6  Prod2 2019-04-03   2.0   4.0   4.0
7  Prod2 2019-04-04   2.0   5.0   3.0
like image 101
jezrael Avatar answered Oct 03 '22 01:10

jezrael