I am resampling a Pandas TimeSeries. The timeseries consist of binary values (it is a categorical variable) with no missing values, but after resampling NaNs appear. How is this possible?
I can't post any example data here since it is sensitive info, but I create and resample the series as follows:
series = pd.Series(data, ts)
series_rs = series.resample('60T', how='mean')
The problem is, that it ignores the nans.
Pandas Series: resample() function The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
Quoting the words from documentation, resample is a “Convenient method for frequency conversion and resampling of time series.” In practice, there are 2 main reasons why using resample. To inspect how data behaves differently under different resolutions or frequency. To join tables with different resolutions.
Use the DataFrame. fillna(0) method to replace NaN/None values with the 0 value. It doesn't change the object data but returns a new DataFrame.
upsampling
converts to a regular time interval, so if there are no samples you get NaN
.
You can fill missing values backward by fill_method='bfill'
or for forward - fill_method='ffill'
or fill_method='pad'
.
import pandas as pd
ts = pd.date_range('1/1/2015', periods=10, freq='100T')
data = range(10)
series = pd.Series(data, ts)
print series
#2015-01-01 00:00:00 0
#2015-01-01 01:40:00 1
#2015-01-01 03:20:00 2
#2015-01-01 05:00:00 3
#2015-01-01 06:40:00 4
#2015-01-01 08:20:00 5
#2015-01-01 10:00:00 6
#2015-01-01 11:40:00 7
#2015-01-01 13:20:00 8
#2015-01-01 15:00:00 9
#Freq: 100T, dtype: int64
series_rs = series.resample('60T', how='mean')
print series_rs
#2015-01-01 00:00:00 0
#2015-01-01 01:00:00 1
#2015-01-01 02:00:00 NaN
#2015-01-01 03:00:00 2
#2015-01-01 04:00:00 NaN
#2015-01-01 05:00:00 3
#2015-01-01 06:00:00 4
#2015-01-01 07:00:00 NaN
#2015-01-01 08:00:00 5
#2015-01-01 09:00:00 NaN
#2015-01-01 10:00:00 6
#2015-01-01 11:00:00 7
#2015-01-01 12:00:00 NaN
#2015-01-01 13:00:00 8
#2015-01-01 14:00:00 NaN
#2015-01-01 15:00:00 9
#Freq: 60T, dtype: float64
series_rs = series.resample('60T', how='mean', fill_method='bfill')
print series_rs
#2015-01-01 00:00:00 0
#2015-01-01 01:00:00 1
#2015-01-01 02:00:00 2
#2015-01-01 03:00:00 2
#2015-01-01 04:00:00 3
#2015-01-01 05:00:00 3
#2015-01-01 06:00:00 4
#2015-01-01 07:00:00 5
#2015-01-01 08:00:00 5
#2015-01-01 09:00:00 6
#2015-01-01 10:00:00 6
#2015-01-01 11:00:00 7
#2015-01-01 12:00:00 8
#2015-01-01 13:00:00 8
#2015-01-01 14:00:00 9
#2015-01-01 15:00:00 9
#Freq: 60T, dtype: float64
Please note that fill_method has now been deprecated. resample()
now returns a resampling object on which you can perform operations just like a groupby object.
common downsampling operations:
.mean()
.sum()
.agg()
.apply()
upsampling operations:
.ffill()
.bfill()
See the whats-new message in the documentation https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#whatsnew-0180-breaking-resample
so the example would become
series_rs = series.resample('60T').mean()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With