I have dataset with missing dates like this.
date,value
2015-01-01,7392
2015-01-03,4928
2015-01-06,8672
This is what I expect to achieve.
date,value
2015-01-01,7392
2015-01-02,7392 # ffill 1st
2015-01-03,4928
2015-01-04,4928 # ffill 3rd
2015-01-05,4928 # ffill 3rd
2015-01-06,8672
I tried a lot, I read the documentation, but I could not find a solutioni. I guessed using df.resample('d',fill_method='ffill'), but I am not still reaching here. Could anyone help me to solve the problem?
This is what I did.
>>> import pandas as pd
>>> df = pd.read_csv(text,sep="\t",index_col='date')
>>> df.index = df.index.to_datetime()
>>> index = pd.date_range(df.index[1],df.index.max())
Here I get the DatetimeIndex from 2015-01-01 to 2015-01-06.
>>> values = [ x for x in range(len(index)) ]
>>> df2 = pd.DataFrame(values,index=index)
Next I am going to merge the original data and DatetimeIndex.
>>> df + df2
0 value
2015-01-01 NaN NaN
2015-01-02 NaN NaN
2015-01-03 NaN NaN
2015-01-04 NaN NaN
2015-01-05 NaN NaN
2015-01-06 NaN NaN
NaN? I am puzzled.
>>> df3 = df + df2
>>> df3.info()
DatetimeIndex: 10 entries, 2015-01-01 to 2015-01-10
Data columns (total 2 columns):
value 0 non-null float64
dtypes: float64(1)
The original value was int, but it converted into float.
What is my mistake?
Try this:
import numpy as np
df2 = pd.DataFrame(np.nan, index=index)
df.combine_first(df2).fillna(method='ffill')
combine_first
will replace nan
values in df2
with values from the original df
when they exist. You can then fill the remaining nan
values with fillna
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With