Modified from this example: <pre class="prettyprint"><code>import io import pandas as pd import matplotlib.pyplot as plt data = io.StringIO('''\ Values 1992-08-27 07:46:48,1 1992-08-27 08:00:48,2 1992-08-27 08:33:48,4 1992-08-27 08:43:48,3 1992-08-27 08:48:48,1 1992-08-27 08:51:48,5 1992-08-27 08:53:48,4 1992-08-27 08:56:48,2 1992-08-27 09:03:48,1 ''') s = pd.read_csv(data, squeeze=True) s.index = pd.to_datetime(s.index) res = s.resample('4s').interpolate('linear') print(res) plt.plot(res, '.-') plt.plot(s, 'o') plt.grid(True) </code></pre> It works as expected: <pre class="prettyprint"><code>1992-08-27 07:46:48 1.000000 1992-08-27 07:46:52 1.004762 1992-08-27 07:46:56 1.009524 1992-08-27 07:47:00 1.014286 1992-08-27 07:47:04 1.019048 1992-08-27 07:47:08 1.023810 1992-08-27 07:47:12 1.028571 .... </code></pre> <img src="https://i.stack.imgur.com/tlkcz.png" alt="interpolated values"> but if I change the resample to <code>'5s'</code>, it produces only NaNs: <pre class="prettyprint"><code>1992-08-27 07:46:45 NaN 1992-08-27 07:46:50 NaN 1992-08-27 07:46:55 NaN 1992-08-27 07:47:00 NaN 1992-08-27 07:47:05 NaN 1992-08-27 07:47:10 NaN 1992-08-27 07:47:15 NaN .... </code></pre> Why?

Option 1 That's because <code>'4s'</code> aligns perfectly with your existing index. When you <code>resample</code>, you get representation from your old series and are able to interpolate. What you want to do is to create an index that is the union of the old index with a new index. Then interpolate and reindex with a new index. <pre class="prettyprint"><code>oidx = s.index nidx = pd.date_range(oidx.min(), oidx.max(), freq='5s') res = s.reindex(oidx.union(nidx)).interpolate('index').reindex(nidx) res.plot(style='.-') s.plot(style='o') </code></pre> <img src="https://i.stack.imgur.com/BdQ1K.png" alt="enter image description here"> <hr> Option 2A If you are willing to forgo accuracy, you can <code>ffill</code> with a limit of <code>1</code> <pre class="prettyprint"><code>res = s.resample('5s').ffill(limit=1).interpolate() res.plot(style='.-') s.plot(style='o') </code></pre> <img src="https://i.stack.imgur.com/MBecc.png" alt="enter image description here"> <hr> Option 2B Same thing with <code>bfill</code> <pre class="prettyprint"><code>res = s.resample('5s').bfill(limit=1).interpolate() res.plot(style='.-') s.plot(style='o') </code></pre> <img src="https://i.stack.imgur.com/3NQOn.png" alt="enter image description here"> <hr> Option 3 Intermediate complexity and accuracy <pre class="prettyprint"><code>nidx = pd.date_range(oidx.min(), oidx.max(), freq='5s') res = s.reindex(nidx, method='nearest', limit=1).interpolate() res.plot(style='.-') s.plot(style='o') </code></pre> <img src="https://i.stack.imgur.com/G9E2X.png" alt="enter image description here">

pandas resample interpolate is producing NaNs

Tags:

python

pandas

interpolation

Modified from this example:

import io
import pandas as pd
import matplotlib.pyplot as plt

data = io.StringIO('''\
Values
1992-08-27 07:46:48,1
1992-08-27 08:00:48,2
1992-08-27 08:33:48,4
1992-08-27 08:43:48,3
1992-08-27 08:48:48,1
1992-08-27 08:51:48,5
1992-08-27 08:53:48,4
1992-08-27 08:56:48,2
1992-08-27 09:03:48,1
''')
s = pd.read_csv(data, squeeze=True)
s.index = pd.to_datetime(s.index)

res = s.resample('4s').interpolate('linear')
print(res)
plt.plot(res, '.-')
plt.plot(s, 'o')
plt.grid(True)

It works as expected:

1992-08-27 07:46:48    1.000000
1992-08-27 07:46:52    1.004762
1992-08-27 07:46:56    1.009524
1992-08-27 07:47:00    1.014286
1992-08-27 07:47:04    1.019048
1992-08-27 07:47:08    1.023810
1992-08-27 07:47:12    1.028571
....

interpolated values

but if I change the resample to '5s', it produces only NaNs:

1992-08-27 07:46:45   NaN
1992-08-27 07:46:50   NaN
1992-08-27 07:46:55   NaN
1992-08-27 07:47:00   NaN
1992-08-27 07:47:05   NaN
1992-08-27 07:47:10   NaN
1992-08-27 07:47:15   NaN
....

Why?

756

asked Nov 07 '17 01:11

endolith

1 Answers

Option 1
That's because '4s' aligns perfectly with your existing index. When you resample, you get representation from your old series and are able to interpolate. What you want to do is to create an index that is the union of the old index with a new index. Then interpolate and reindex with a new index.

oidx = s.index
nidx = pd.date_range(oidx.min(), oidx.max(), freq='5s')
res = s.reindex(oidx.union(nidx)).interpolate('index').reindex(nidx)
res.plot(style='.-')
s.plot(style='o')

enter image description here

Option 2A
If you are willing to forgo accuracy, you can ffill with a limit of 1

res = s.resample('5s').ffill(limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

enter image description here

Option 2B
Same thing with bfill

res = s.resample('5s').bfill(limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

enter image description here

Option 3
Intermediate complexity and accuracy

nidx = pd.date_range(oidx.min(), oidx.max(), freq='5s')
res = s.reindex(nidx, method='nearest', limit=1).interpolate()
res.plot(style='.-')
s.plot(style='o')

enter image description here

135

answered Sep 24 '22 02:09

piRSquared

Related questions
                            
                                Rounding down values in Pandas dataframe column with NaNs
                            
                                Django: conditional expression
                            
                                How to let pyenv to find installed python versions
                            
                                How to interpret 4 bytes as a 32-bit float using Python
                            
                                ImportError: No module named 'pandas'
                            
                                Converting Tuple of integers and strings to just a string
                            
                                Find value greater than level - Python Pandas
                            
                                With SQLAlchemy metadata reflect() how do you get an actual table object?
                            
                                Networkx: Get the distance between nodes
                            
                                Write text to file line by line [duplicate]
                            
                                Why the negative reshape (-1) in MNIST tutorial?
                            
                                Removing a character from entire data frame
                            
                                How to include libgtk2.0-dev and pkg-config in cmake when installing openCV on Ubuntu 16
                            
                                Merging dataframes keeping all items pandas
                            
                                Map unique strings to integers in Python [duplicate]
                            
                                Regex replace is taking time for millions of documents, how to make it faster?
                            
                                How to get a list of classes and functions from a python file without importing it
                            
                                Is there really an @ operator in Python to calculate dot product?
                            
                                Flask - Webserver not reloading on code change
                            
                                VS Code doesn't recognize pep8

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas resample interpolate is producing NaNs

Tags:

python

pandas

interpolation

endolith

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us