I currently have the following sample dataframe:
No FlNo DATE Loc Type
20 1826 6/1/2017 AAA O
20 1112 6/4/2017 BBB O
20 1234 6/6/2017 CCC O
20 43 6/7/2017 DDD O
20 1840 6/8/2017 EEE O
I want to fill in missing dates for two rows right on top of each other. I want to also fill in the values of the non-date columns with the values in the top row BUT leave 'Type' column blank for filled in rows.
Please see desired output:
No FlNo DATE Loc Type
20 1826 6/1/2017 AAA O
20 1826 6/2/2017 AAA
20 1826 6/3/2017 AAA
20 1112 6/4/2017 BBB O
20 1112 6/5/2017 BBB
20 1234 6/6/2017 CCC O
20 43 6/7/2017 DDD O
20 1840 6/8/2017 EEE O
I have searched all around Google and stackoverflow but did not find any date fill in answers for pandas dataframe.
First, convert DATE
to a datetime
column using pd.to_datetime
,
df.DATE = pd.to_datetime(df.DATE)
Option 1
Use resample
+ ffill
, and then reset the Type column later. First, store the unique dates in some list:
dates = df.DATE.unique()
Now,
df = df.set_index('DATE').resample('1D').ffill().reset_index()
df.Type = df.Type.where(df.DATE.isin(dates), '')
df
DATE No FlNo Loc Type
0 2017-06-01 20 1826 AAA O
1 2017-06-02 20 1826 AAA
2 2017-06-03 20 1826 AAA
3 2017-06-04 20 1112 BBB O
4 2017-06-05 20 1112 BBB
5 2017-06-06 20 1234 CCC O
6 2017-06-07 20 43 DDD O
7 2017-06-08 20 1840 EEE O
If needed, you may bring DATE
back to its original state;
df.DATE = df.DATE.dt.strftime('%m/%d/%Y')
Option 2
Another option would be asfreq
+ ffill
+ fillna
:
df = df.set_index('DATE').asfreq('1D').reset_index()
c = df.columns.difference(['Type'])
df[c] = df[c].ffill()
df['Type'] = df['Type'].fillna('')
df
DATE No FlNo Loc Type
0 2017-06-01 20.0 1826.0 AAA O
1 2017-06-02 20.0 1826.0 AAA
2 2017-06-03 20.0 1826.0 AAA
3 2017-06-04 20.0 1112.0 BBB O
4 2017-06-05 20.0 1112.0 BBB
5 2017-06-06 20.0 1234.0 CCC O
6 2017-06-07 20.0 43.0 DDD O
7 2017-06-08 20.0 1840.0 EEE O
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With