Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas's resample with fill_method: Need to know data from which row was copied?

Tags:

python

pandas

I am trying to use resample method to fill the gaps in timeseries data. But I also want to know which row was used to fill the missed data.

This is my input series.

In [28]: data
Out[28]: 
Date
2002-09-09    233.25
2002-09-11    233.05
2002-09-16    230.25
2002-09-18    230.10
2002-09-19    230.05
Name: Price

With resample, I will get this

In [29]: data.resample("D", fill_method='bfill')
Out[29]: 
Date
2002-09-09    233.25
2002-09-10    233.05
2002-09-11    233.05
2002-09-12    230.25
2002-09-13    230.25
2002-09-14    230.25
2002-09-15    230.25
2002-09-16    230.25
2002-09-17    230.10
2002-09-18    230.10
2002-09-19    230.05
Freq: D

I am looking for

Out[29]: 
Date
2002-09-09    233.25  2002-09-09
2002-09-10    233.05  2012-09-11
2002-09-11    233.05  2012-09-11
2002-09-12    230.25  2012-09-16
2002-09-13    230.25  2012-09-16
2002-09-14    230.25  2012-09-16
2002-09-15    230.25  2012-09-16
2002-09-16    230.25  2012-09-16
2002-09-17    230.10  2012-09-18  
2002-09-18    230.10  2012-09-18
2002-09-19    230.05  2012-09-19

Any help?

like image 965
pvncad Avatar asked Nov 11 '12 16:11

pvncad


People also ask

How do I resample data in pandas?

Resample Pandas time-series data. The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

What is the difference between resample and Asfreq?

resample is more general than asfreq . For example, using resample I can pass an arbitrary function to perform binning over a Series or DataFrame object in bins of arbitrary size. asfreq is a concise way of changing the frequency of a DatetimeIndex object. It also provides padding functionality.

How do you resample a dataset in Python?

Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .


1 Answers

After converting the Series to a DataFrame, copy the index into it's own column. (DatetimeIndex.format() is useful here as it returns a string representation of the index, rather than Timestamp/datetime objects.)

In [510]: df = pd.DataFrame(data)

In [511]: df['OrigDate'] = df.index.format()

In [513]: df
Out[513]: 
             Price    OrigDate
Date                          
2002-09-09  233.25  2002-09-09
2002-09-11  233.05  2002-09-11
2002-09-16  230.25  2002-09-16
2002-09-18  230.10  2002-09-18
2002-09-19  230.05  2002-09-19

For resampling without aggregation, there is a helper method asfreq().

In [528]: df.asfreq("D", method='bfill')
Out[528]: 
             Price    OrigDate
2002-09-09  233.25  2002-09-09
2002-09-10  233.05  2002-09-11
2002-09-11  233.05  2002-09-11
2002-09-12  230.25  2002-09-16
2002-09-13  230.25  2002-09-16
2002-09-14  230.25  2002-09-16
2002-09-15  230.25  2002-09-16
2002-09-16  230.25  2002-09-16
2002-09-17  230.10  2002-09-18
2002-09-18  230.10  2002-09-18
2002-09-19  230.05  2002-09-19

This is effectively short-hand for the following, where last() is invoked on the intermediate DataFrameGroupBy objects.

In [529]: df.resample("D", how='last', fill_method='bfill')
Out[529]: 
             Price    OrigDate
Date                          
2002-09-09  233.25  2002-09-09
2002-09-10  233.05  2002-09-11
2002-09-11  233.05  2002-09-11
2002-09-12  230.25  2002-09-16
2002-09-13  230.25  2002-09-16
2002-09-14  230.25  2002-09-16
2002-09-15  230.25  2002-09-16
2002-09-16  230.25  2002-09-16
2002-09-17  230.10  2002-09-18
2002-09-18  230.10  2002-09-18
2002-09-19  230.05  2002-09-19
like image 180
Garrett Avatar answered Oct 01 '22 12:10

Garrett