Let's say I've got the following table: <pre class="prettyprint"><code>ProdID Date Val1 Val2 Val3 Prod1 4/1/2019 1 3 4 Prod1 4/3/2019 2 3 54 Prod1 4/4/2019 3 4 54 Prod2 4/1/2019 1 3 3 Prod2 4/2/2019 1 3 4 Prod2 4/3/2019 2 4 4 Prod2 4/4/2019 2 5 3 </code></pre> <code>Prod2</code> entries are populated correctly as we've got the data from <code>4/1/2019</code> to <code>4/4/2019</code>. <code>Prod1</code> has 1 missing date - <code>4/2/2019</code>. I would like to find missing dates for all ProdIDs and fill in Val1-3 with data copied from the last of previous entry. For instance, I would like to copy data from <code>4/1/2019</code> to <code>4/2/2019</code> <pre class="prettyprint"><code>ProdID Date Val1 Val2 Val3 Prod1 4/1/2019 1 3 4 Prod1 4/2/2019 1 3 4 Prod1 4/3/2019 2 3 54 Prod1 4/4/2019 3 4 54 Prod2 4/1/2019 1 3 3 Prod2 4/2/2019 1 3 4 Prod2 4/3/2019 2 4 4 Prod2 4/4/2019 2 5 3 </code></pre>

First convert column to <code>datetime</code>s by <code>to_datetime</code>, then create <code>DatetimeIndex</code> by <code>DataFrame.set_index</code> and call <code>GroupBy.apply</code> with <code>DataFrame.asfreq</code> - there is also possible specify method for forward or back filling missing values: <pre class="prettyprint"><code>df['Date'] = pd.to_datetime(df['Date']) df1 = (df.set_index('Date') .groupby('ProdID') .apply(lambda x: x.asfreq('D', method='ffill')) .reset_index(level=0, drop=True) .reset_index() .reindex(df.columns, axis=1)) print (df1) ProdID Date Val1 Val2 Val3 0 Prod1 2019-04-01 1 3 4 1 Prod1 2019-04-02 1 3 4 2 Prod1 2019-04-03 2 3 54 3 Prod1 2019-04-04 3 4 54 4 Prod2 2019-04-01 1 3 3 5 Prod2 2019-04-02 1 3 4 6 Prod2 2019-04-03 2 4 4 7 Prod2 2019-04-04 2 5 3 </code></pre> Another solution is create all combinations of product and <code>datetimes</code> by <code>product</code> and <code>DataFrame.merge</code> with left join, last forward filling missing values by <code>ffill</code>: <pre class="prettyprint"><code>dates = pd.date_range(start=df['Date'].min(), end=df['Date'].max()) prods = df.ProdID.unique() from itertools import product df1 = pd.DataFrame(list(product(prods, dates)), columns=['ProdID', 'Date']) print (df1) ProdID Date 0 Prod1 2019-04-01 1 Prod1 2019-04-02 2 Prod1 2019-04-03 3 Prod1 2019-04-04 4 Prod2 2019-04-01 5 Prod2 2019-04-02 6 Prod2 2019-04-03 7 Prod2 2019-04-04 df = df1.merge(df, how='left').ffill() print (df) ProdID Date Val1 Val2 Val3 0 Prod1 2019-04-01 1.0 3.0 4.0 1 Prod1 2019-04-02 1.0 3.0 4.0 2 Prod1 2019-04-03 2.0 3.0 54.0 3 Prod1 2019-04-04 3.0 4.0 54.0 4 Prod2 2019-04-01 1.0 3.0 3.0 5 Prod2 2019-04-02 1.0 3.0 4.0 6 Prod2 2019-04-03 2.0 4.0 4.0 7 Prod2 2019-04-04 2.0 5.0 3.0 </code></pre>

Pandas: Filling data for missing dates

Tags:

python

pandas

missing-data

Let's say I've got the following table:

ProdID  Date        Val1 Val2 Val3
Prod1   4/1/2019    1    3    4
Prod1   4/3/2019    2    3    54
Prod1   4/4/2019    3    4    54
Prod2   4/1/2019    1    3    3
Prod2   4/2/2019    1    3    4
Prod2   4/3/2019    2    4    4
Prod2   4/4/2019    2    5    3

Prod2 entries are populated correctly as we've got the data from 4/1/2019 to 4/4/2019.

Prod1 has 1 missing date - 4/2/2019.

I would like to find missing dates for all ProdIDs and fill in Val1-3 with data copied from the last of previous entry. For instance, I would like to copy data from 4/1/2019 to 4/2/2019

ProdID  Date        Val1 Val2 Val3
Prod1   4/1/2019    1    3    4
Prod1   4/2/2019    1    3    4
Prod1   4/3/2019    2    3    54
Prod1   4/4/2019    3    4    54
Prod2   4/1/2019    1    3    3
Prod2   4/2/2019    1    3    4
Prod2   4/3/2019    2    4    4
Prod2   4/4/2019    2    5    3

936

asked Apr 09 '19 10:04

MarekK

1 Answers

First convert column to datetimes by to_datetime, then create DatetimeIndex by DataFrame.set_index and call GroupBy.apply with DataFrame.asfreq - there is also possible specify method for forward or back filling missing values:

df['Date'] = pd.to_datetime(df['Date'])

df1 = (df.set_index('Date')
         .groupby('ProdID')
         .apply(lambda x: x.asfreq('D', method='ffill'))
         .reset_index(level=0, drop=True)
         .reset_index()
         .reindex(df.columns, axis=1))

print (df1)
  ProdID       Date  Val1  Val2  Val3
0  Prod1 2019-04-01     1     3     4
1  Prod1 2019-04-02     1     3     4
2  Prod1 2019-04-03     2     3    54
3  Prod1 2019-04-04     3     4    54
4  Prod2 2019-04-01     1     3     3
5  Prod2 2019-04-02     1     3     4
6  Prod2 2019-04-03     2     4     4
7  Prod2 2019-04-04     2     5     3

Another solution is create all combinations of product and datetimes by product and DataFrame.merge with left join, last forward filling missing values by ffill:

dates = pd.date_range(start=df['Date'].min(), end=df['Date'].max())
prods = df.ProdID.unique()

from  itertools import product
df1 = pd.DataFrame(list(product(prods, dates)), columns=['ProdID', 'Date'])
print (df1)
  ProdID       Date
0  Prod1 2019-04-01
1  Prod1 2019-04-02
2  Prod1 2019-04-03
3  Prod1 2019-04-04
4  Prod2 2019-04-01
5  Prod2 2019-04-02
6  Prod2 2019-04-03
7  Prod2 2019-04-04

df = df1.merge(df, how='left').ffill()
print (df)
  ProdID       Date  Val1  Val2  Val3
0  Prod1 2019-04-01   1.0   3.0   4.0
1  Prod1 2019-04-02   1.0   3.0   4.0
2  Prod1 2019-04-03   2.0   3.0  54.0
3  Prod1 2019-04-04   3.0   4.0  54.0
4  Prod2 2019-04-01   1.0   3.0   3.0
5  Prod2 2019-04-02   1.0   3.0   4.0
6  Prod2 2019-04-03   2.0   4.0   4.0
7  Prod2 2019-04-04   2.0   5.0   3.0

101

answered Oct 03 '22 01:10

jezrael

Related questions
                            
                                pandas groupby and then select a row by value of column (min,max, for example)
                            
                                python find connected components in a 3D graph / tuple with three elements?
                            
                                When should one use tf.train.BytesList, tf.train.FloatList, and tf.train.Int64List for data to be stored in a tf.train.Feature?
                            
                                Unable to fetch `href` from Reddit embedded feed window using scrapy
                            
                                Save pandas pivot_table to include index and columns names
                            
                                how save a file in Python whose name has slashes in it
                            
                                How to alias all methods from an object?
                            
                                Creating an ASCII art world map
                            
                                Regex to match and replace string with multiple lines Python
                            
                                How to fix 'parse error on (VAR_SIGN)' in a graphql query in python
                            
                                Keras - How to get time taken by each layer in training?
                            
                                Python NetworkX — set node color automatically based on a list of values
                            
                                Conditional element classes with jinja, I want a div to get a class if a list item contains a certain item
                            
                                matplotlib hist function argmument density not working
                            
                                Call a coroutine without yielding the event loop
                            
                                How to search and play a video on YouTube using Selenium in Python?
                            
                                How to resample a column by id
                            
                                Import failure of s3fs library in AWS Glue
                            
                                Install Python Fabric on Windows [closed]
                            
                                Sum of diagonal elements in a matrix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With