I have dataset with missing dates like this.
date,value
2015-01-01,7392
2015-01-03,4928
2015-01-06,8672
This is what I expect to achieve.
date,value
2015-01-01,7392
2015-01-02,7392 # ffill 1st
2015-01-03,4928
2015-01-04,4928 # ffill 3rd
2015-01-05,4928 # ffill 3rd
2015-01-06,8672
I tried a lot, I read the documentation, but I could not find a solutioni. I guessed using df.resample('d',fill_method='ffill'), but I am not still reaching here. Could anyone help me to solve the problem?
This is what I did.
>>> import pandas as pd
>>> df = pd.read_csv(text,sep="\t",index_col='date')
>>> df.index = df.index.to_datetime()
>>> index = pd.date_range(df.index[1],df.index.max())
Here I get the DatetimeIndex from 2015-01-01 to 2015-01-06.
>>> values = [ x for x in range(len(index)) ]
>>> df2 = pd.DataFrame(values,index=index)
Next I am going to merge the original data and DatetimeIndex.
>>> df + df2
0 value
2015-01-01 NaN NaN
2015-01-02 NaN NaN
2015-01-03 NaN NaN
2015-01-04 NaN NaN
2015-01-05 NaN NaN
2015-01-06 NaN NaN
NaN? I am puzzled.
>>> df3 = df + df2
>>> df3.info()
DatetimeIndex: 10 entries, 2015-01-01 to 2015-01-10
Data columns (total 2 columns):
value 0 non-null float64
dtypes: float64(1)
The original value was int, but it converted into float.
What is my mistake?
Try this:
import numpy as np
df2 = pd.DataFrame(np.nan, index=index)
df.combine_first(df2).fillna(method='ffill')
combine_first will replace nan values in df2 with values from the original df when they exist. You can then fill the remaining nan values with fillna.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With