Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resample rows for missing dates and forward fill values in all columns except one

I currently have the following sample dataframe:

No  FlNo    DATE        Loc    Type
20  1826    6/1/2017    AAA    O
20  1112    6/4/2017    BBB    O
20  1234    6/6/2017    CCC    O
20  43      6/7/2017    DDD    O
20  1840    6/8/2017    EEE    O

I want to fill in missing dates for two rows right on top of each other. I want to also fill in the values of the non-date columns with the values in the top row BUT leave 'Type' column blank for filled in rows.

Please see desired output:

No  FlNo    DATE        Loc    Type 
20  1826    6/1/2017    AAA    O
20  1826    6/2/2017    AAA
20  1826    6/3/2017    AAA
20  1112    6/4/2017    BBB    O
20  1112    6/5/2017    BBB
20  1234    6/6/2017    CCC    O
20  43      6/7/2017    DDD    O
20  1840    6/8/2017    EEE    O

I have searched all around Google and stackoverflow but did not find any date fill in answers for pandas dataframe.

like image 572
PineNuts0 Avatar asked Mar 07 '23 13:03

PineNuts0


1 Answers

First, convert DATE to a datetime column using pd.to_datetime,

df.DATE = pd.to_datetime(df.DATE)

Option 1
Use resample + ffill, and then reset the Type column later. First, store the unique dates in some list:

dates = df.DATE.unique()

Now,

df = df.set_index('DATE').resample('1D').ffill().reset_index()
df.Type = df.Type.where(df.DATE.isin(dates), '')

df

        DATE  No  FlNo  Loc Type
0 2017-06-01  20  1826  AAA    O
1 2017-06-02  20  1826  AAA     
2 2017-06-03  20  1826  AAA     
3 2017-06-04  20  1112  BBB    O
4 2017-06-05  20  1112  BBB     
5 2017-06-06  20  1234  CCC    O
6 2017-06-07  20    43  DDD    O
7 2017-06-08  20  1840  EEE    O

If needed, you may bring DATE back to its original state;

df.DATE = df.DATE.dt.strftime('%m/%d/%Y')

Option 2
Another option would be asfreq + ffill + fillna:

df = df.set_index('DATE').asfreq('1D').reset_index()
c = df.columns.difference(['Type'])
df[c] = df[c].ffill()
df['Type'] = df['Type'].fillna('')

df
        DATE    No    FlNo  Loc Type
0 2017-06-01  20.0  1826.0  AAA    O
1 2017-06-02  20.0  1826.0  AAA     
2 2017-06-03  20.0  1826.0  AAA     
3 2017-06-04  20.0  1112.0  BBB    O
4 2017-06-05  20.0  1112.0  BBB     
5 2017-06-06  20.0  1234.0  CCC    O
6 2017-06-07  20.0    43.0  DDD    O
7 2017-06-08  20.0  1840.0  EEE    O
like image 52
cs95 Avatar answered Apr 08 '23 21:04

cs95