How to complete time series data with some missing dates with pandas

Question

I have dataset with missing dates like this.

date,value
2015-01-01,7392
2015-01-03,4928
2015-01-06,8672

This is what I expect to achieve.

date,value
2015-01-01,7392
2015-01-02,7392 # ffill 1st
2015-01-03,4928
2015-01-04,4928 # ffill 3rd
2015-01-05,4928 # ffill 3rd
2015-01-06,8672

I tried a lot, I read the documentation, but I could not find a solutioni. I guessed using df.resample('d',fill_method='ffill'), but I am not still reaching here. Could anyone help me to solve the problem?

This is what I did.

>>> import pandas as pd
>>> df = pd.read_csv(text,sep="	",index_col='date')
>>> df.index = df.index.to_datetime()
>>> index = pd.date_range(df.index[1],df.index.max())

Here I get the DatetimeIndex from 2015-01-01 to 2015-01-06.

>>> values = [ x for x in range(len(index)) ]
>>> df2 = pd.DataFrame(values,index=index)

Next I am going to merge the original data and DatetimeIndex.

>>> df + df2

             0   value
2015-01-01 NaN NaN
2015-01-02 NaN NaN
2015-01-03 NaN NaN
2015-01-04 NaN NaN
2015-01-05 NaN NaN
2015-01-06 NaN NaN

NaN? I am puzzled.

>>> df3 = df + df2
>>> df3.info()

DatetimeIndex: 10 entries, 2015-01-01 to 2015-01-10
Data columns (total 2 columns):
value    0 non-null float64
dtypes: float64(1)

The original value was int, but it converted into float.

What is my mistake?

IanS · Accepted Answer

Try this:

import numpy as np
df2 = pd.DataFrame(np.nan, index=index)
df.combine_first(df2).fillna(method='ffill')

combine_first will replace nan values in df2 with values from the original df when they exist. You can then fill the remaining nan values with fillna.

How to complete time series data with some missing dates with pandas

Tags:

python

pandas

time-series

Akio Omi

1 Answers

IanS

Recent Activity

Donate For Us

How to complete time series data with some missing dates with pandas

Tags:

python

pandas

time-series

Akio Omi

1 Answers

IanS

Related questions

Recent Activity

Donate For Us