Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas resample interpolate is producing NaNs

Modified from this example:

import io
import pandas as pd
import matplotlib.pyplot as plt

data = io.StringIO('''\
Values
1992-08-27 07:46:48,1
1992-08-27 08:00:48,2
1992-08-27 08:33:48,4
1992-08-27 08:43:48,3
1992-08-27 08:48:48,1
1992-08-27 08:51:48,5
1992-08-27 08:53:48,4
1992-08-27 08:56:48,2
1992-08-27 09:03:48,1
''')
s = pd.read_csv(data, squeeze=True)
s.index = pd.to_datetime(s.index)

res = s.resample('4s').interpolate('linear')
print(res)
plt.plot(res, '.-')
plt.plot(s, 'o')
plt.grid(True)

It works as expected:

1992-08-27 07:46:48    1.000000
1992-08-27 07:46:52    1.004762
1992-08-27 07:46:56    1.009524
1992-08-27 07:47:00    1.014286
1992-08-27 07:47:04    1.019048
1992-08-27 07:47:08    1.023810
1992-08-27 07:47:12    1.028571
....

interpolated values

but if I change the resample to '5s', it produces only NaNs:

1992-08-27 07:46:45   NaN
1992-08-27 07:46:50   NaN
1992-08-27 07:46:55   NaN
1992-08-27 07:47:00   NaN
1992-08-27 07:47:05   NaN
1992-08-27 07:47:10   NaN
1992-08-27 07:47:15   NaN
....

Why?

like image 756
endolith Avatar asked Nov 07 '17 01:11

endolith


People also ask

Does Panda resample ignore NaN?

The problem is, that it ignores the nans.

Is resampling the same as interpolation?

Resampling is used to either increase the sample rate (make the image larger) or decrease it (make the image smaller). Interpolation is the process of calculating values between sample points. So, if you resample an image you can use interpolation to do it.

What does resample pandas do?

Pandas Series: resample() function The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

How do pandas interpolate missing values?

You can interpolate missing values ( NaN ) in pandas. DataFrame and Series with interpolate() . This article describes the following contents. Use dropna() and fillna() to remove missing values NaN or to fill them with a specific value.


1 Answers

Option 1
That's because '4s' aligns perfectly with your existing index. When you resample, you get representation from your old series and are able to interpolate. What you want to do is to create an index that is the union of the old index with a new index. Then interpolate and reindex with a new index.

oidx = s.index
nidx = pd.date_range(oidx.min(), oidx.max(), freq='5s')
res = s.reindex(oidx.union(nidx)).interpolate('index').reindex(nidx)
res.plot(style='.-')
s.plot(style='o')

enter image description here


Option 2A
If you are willing to forgo accuracy, you can ffill with a limit of 1

res = s.resample('5s').ffill(limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

enter image description here


Option 2B
Same thing with bfill

res = s.resample('5s').bfill(limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

enter image description here


Option 3
Intermediate complexity and accuracy

nidx = pd.date_range(oidx.min(), oidx.max(), freq='5s')
res = s.reindex(nidx, method='nearest', limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

enter image description here

like image 135
piRSquared Avatar answered Sep 24 '22 02:09

piRSquared