Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to complete time series data with some missing dates with pandas

I have dataset with missing dates like this.

date,value
2015-01-01,7392
2015-01-03,4928
2015-01-06,8672

This is what I expect to achieve.

date,value
2015-01-01,7392
2015-01-02,7392 # ffill 1st
2015-01-03,4928
2015-01-04,4928 # ffill 3rd
2015-01-05,4928 # ffill 3rd
2015-01-06,8672

I tried a lot, I read the documentation, but I could not find a solutioni. I guessed using df.resample('d',fill_method='ffill'), but I am not still reaching here. Could anyone help me to solve the problem?

This is what I did.

>>> import pandas as pd
>>> df = pd.read_csv(text,sep="\t",index_col='date')
>>> df.index = df.index.to_datetime()
>>> index = pd.date_range(df.index[1],df.index.max())

Here I get the DatetimeIndex from 2015-01-01 to 2015-01-06.

>>> values = [ x for x in range(len(index)) ]
>>> df2 = pd.DataFrame(values,index=index)

Next I am going to merge the original data and DatetimeIndex.

>>> df + df2

             0   value
2015-01-01 NaN NaN
2015-01-02 NaN NaN
2015-01-03 NaN NaN
2015-01-04 NaN NaN
2015-01-05 NaN NaN
2015-01-06 NaN NaN

NaN? I am puzzled.

>>> df3 = df + df2
>>> df3.info()

DatetimeIndex: 10 entries, 2015-01-01 to 2015-01-10
Data columns (total 2 columns):
value    0 non-null float64
dtypes: float64(1)

The original value was int, but it converted into float.

What is my mistake?

like image 315
Akio Omi Avatar asked Nov 08 '22 16:11

Akio Omi


1 Answers

Try this:

import numpy as np
df2 = pd.DataFrame(np.nan, index=index)
df.combine_first(df2).fillna(method='ffill')

combine_first will replace nan values in df2 with values from the original df when they exist. You can then fill the remaining nan values with fillna.

like image 147
IanS Avatar answered Nov 14 '22 23:11

IanS