Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot interpolate dataframe even if most of the data is filled

Tags:

python

pandas

I tried to interpolate the NaN in my DataFrame using interpolate() method. However, the method failed with error :

Cannot interpolate with all NaNs.

Here's the code:

try:
    df3.interpolate(method='index', inplace=True)
    processor._arma(df3['TCA'])
except Exception, e:
    sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, e))
    sys.stderr.write('%s: [%s] len=%d\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, len(df3.index)))
    sys.stderr.write('%s: [%s] %s\n' % (time.strftime("%Y-%m-%d %H:%M:%S"), nid3, df3.to_string()))

This is strange, because most of the data is already filled, as you can see in log 1 or log 2. The length of the dataframe is 20, as all the data shown below. Even each cell is filled, I still can't use interpolate method. BTW, df3 is a global value, I'm not sure if it would be a problem.


log 1

2016-01-21 22:06:11: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:11: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:11: [ESIG_node_003_400585511]
                     TCA TCB TCC
2016-01-21 20:06:22  19  17  18
2016-01-21 20:06:23  19  17  18
2016-01-21 20:06:24  18  18  18
2016-01-21 20:06:25  18  17  18
2016-01-21 20:06:26  18  18  18
2016-01-21 20:06:27  19  18  18
2016-01-21 20:06:28  19  17  18
2016-01-21 20:06:29  18  18  18
2016-01-21 20:06:30  18  17  18
2016-01-21 20:06:31  19  17  18
2016-01-21 20:06:32  18  17  18
2016-01-21 20:06:33  18  18  18
2016-01-21 20:06:34  19  18  18
2016-01-21 20:06:35  18  17  18
2016-01-21 20:06:36  19  18  18
2016-01-21 20:06:37  18  18  18
2016-01-21 20:06:38  18  18  18
2016-01-21 20:06:39  19  18  18
2016-01-21 20:06:40  18  17  18
2016-01-21 20:06:41  18  18  18

log 2

2016-01-21 22:06:14: [ESIG_node_003_400585511] Cannot interpolate with all NaNs.
2016-01-21 22:06:14: [ESIG_node_003_400585511] len=20
2016-01-21 22:06:14: [ESIG_node_003_400585511]
                      TCA  TCB  TCC
2016-01-21 20:06:33   18   18   18
2016-01-21 20:06:34   19   18   18
2016-01-21 20:06:35   18   17   18
2016-01-21 20:06:36   19   18   18
2016-01-21 20:06:37   18   18   18
2016-01-21 20:06:38   18   18   18
2016-01-21 20:06:39   19   18   18
2016-01-21 20:06:40   18   17   18
2016-01-21 20:06:41   18   18   18
2016-01-21 20:06:42  NaN  NaN  NaN
2016-01-21 20:06:43  NaN  NaN  NaN
2016-01-21 20:06:44  NaN  NaN  NaN
2016-01-21 20:06:45  NaN  NaN  NaN
2016-01-21 20:06:46   19   18   18
2016-01-21 20:06:47   18   17   18
2016-01-21 20:06:48   18   18   18
2016-01-21 20:06:49   19   18   18
2016-01-21 20:06:50   18   17   18
2016-01-21 20:06:51   18   18   18
2016-01-21 20:06:52   19   17   18
like image 497
Mincong Huang Avatar asked Jan 21 '16 21:01

Mincong Huang


2 Answers

Check that your DataFrame has numeric dtypes, not object dtypes. The TypeError: Cannot interpolate with all NaNs can occur if the DataFrame contains columns of object dtype. For example, if

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')}, 
                  index=['2016-01-21 20:06:22', '2016-01-21 20:06:23', 
                         '2016-01-21 20:06:24'])

then df.interpolate() raises the TypeError.

To check if your DataFrame has columns with object dtype, look at df3.dtypes:

In [92]: df.dtypes
Out[92]: 
A    object
dtype: object

To fix the problem, you need to ensure the DataFrame has numeric columns with native NumPy dtypes. Obviously, it would be best to build the DataFrame correctly from the very beginning. So the best solution depends on how you are building the DataFrame.

A less appealing patch-up fix would be to use pd.to_numeric to convert the object arrays to numeric arrays after-the-fact:

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')

With errors='coerce', any value that could not be converted to a number is converted to NaN. After calling pd.to_numeric on each column, notice that the dtype is now float64:

In [94]: df.dtypes
Out[94]: 
A    float64
dtype: object

Once the DataFrame has numeric dtypes, and the DataFrame has a DatetimeIndex, then df.interpolate(method='time') will work:

import numpy as np
import pandas as pd

df = pd.DataFrame({'A':np.array([1,np.nan,30], dtype='O')}, 
                  index=['2016-01-21 20:06:22', '2016-01-21 20:06:23', 
                         '2016-01-21 20:06:24'])

for col in df:
    df[col] = pd.to_numeric(df[col], errors='coerce')
df.index = pd.DatetimeIndex(df.index)
df = df.interpolate(method='time')
print(df)

yields

                        A
2016-01-21 20:06:22   1.0
2016-01-21 20:06:23  15.5
2016-01-21 20:06:24  30.0
like image 53
unutbu Avatar answered Oct 28 '22 16:10

unutbu


I had a similar problem, recreated the dataframe with definition of dtype as float (e.g. dtype='float32'). it fixed.

df = pd.DataFrame(data = df.values, columns= cols, dtype='float32')
like image 34
Ramin Nasirpour Avatar answered Oct 28 '22 16:10

Ramin Nasirpour