I am trying to use resample method to fill the gaps in timeseries data. But I also want to know which row was used to fill the missed data.
This is my input series.
In [28]: data
Out[28]:
Date
2002-09-09 233.25
2002-09-11 233.05
2002-09-16 230.25
2002-09-18 230.10
2002-09-19 230.05
Name: Price
With resample, I will get this
In [29]: data.resample("D", fill_method='bfill')
Out[29]:
Date
2002-09-09 233.25
2002-09-10 233.05
2002-09-11 233.05
2002-09-12 230.25
2002-09-13 230.25
2002-09-14 230.25
2002-09-15 230.25
2002-09-16 230.25
2002-09-17 230.10
2002-09-18 230.10
2002-09-19 230.05
Freq: D
I am looking for
Out[29]:
Date
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2012-09-11
2002-09-11 233.05 2012-09-11
2002-09-12 230.25 2012-09-16
2002-09-13 230.25 2012-09-16
2002-09-14 230.25 2012-09-16
2002-09-15 230.25 2012-09-16
2002-09-16 230.25 2012-09-16
2002-09-17 230.10 2012-09-18
2002-09-18 230.10 2012-09-18
2002-09-19 230.05 2012-09-19
Any help?
Resample Pandas time-series data. The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
resample is more general than asfreq . For example, using resample I can pass an arbitrary function to perform binning over a Series or DataFrame object in bins of arbitrary size. asfreq is a concise way of changing the frequency of a DatetimeIndex object. It also provides padding functionality.
Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .
After converting the Series
to a DataFrame
, copy the index into it's own column. (DatetimeIndex.format()
is useful here as it returns a string representation of the index, rather than Timestamp/datetime objects.)
In [510]: df = pd.DataFrame(data)
In [511]: df['OrigDate'] = df.index.format()
In [513]: df
Out[513]:
Price OrigDate
Date
2002-09-09 233.25 2002-09-09
2002-09-11 233.05 2002-09-11
2002-09-16 230.25 2002-09-16
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19
For resampling without aggregation, there is a helper method asfreq()
.
In [528]: df.asfreq("D", method='bfill')
Out[528]:
Price OrigDate
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2002-09-11
2002-09-11 233.05 2002-09-11
2002-09-12 230.25 2002-09-16
2002-09-13 230.25 2002-09-16
2002-09-14 230.25 2002-09-16
2002-09-15 230.25 2002-09-16
2002-09-16 230.25 2002-09-16
2002-09-17 230.10 2002-09-18
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19
This is effectively short-hand for the following, where last()
is invoked on the intermediate DataFrameGroupBy
objects.
In [529]: df.resample("D", how='last', fill_method='bfill')
Out[529]:
Price OrigDate
Date
2002-09-09 233.25 2002-09-09
2002-09-10 233.05 2002-09-11
2002-09-11 233.05 2002-09-11
2002-09-12 230.25 2002-09-16
2002-09-13 230.25 2002-09-16
2002-09-14 230.25 2002-09-16
2002-09-15 230.25 2002-09-16
2002-09-16 230.25 2002-09-16
2002-09-17 230.10 2002-09-18
2002-09-18 230.10 2002-09-18
2002-09-19 230.05 2002-09-19
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With