numpy.polyfit doesn't handle NaN values

Question

I have a problem with this piece of Python-code:

import matplotlib
matplotlib.use("Agg")

import numpy as np
import pylab as pl

A1=np.loadtxt('/tmp/A1.txt',delimiter=',')
A1_extrema = [min(A1),max(A1)]
A2=np.loadtxt('/tmp/A2.txt',delimiter=',')

pl.close()
ab = np.polyfit(A1,A2,1)
print ab
fit = np.poly1d(ab)
print fit
r2 = np.corrcoef(A1,A2)[0,1]
print r2
pl.plot(A1,A2,'r.', label='TMP36 vs. DS18B20', alpha=0.7)
pl.plot(A1_extrema,fit(A1_extrema),'c-')
pl.annotate('{0}'.format(r2) , xy=(min(A1)+0.5,fit(min(A1))), size=6, color='r' )

pl.title('Sensor correlations')
pl.xlabel("T(x) [degC]")
pl.ylabel("T(y) [degC]")
pl.grid(True)
pl.legend(loc='upper left', prop={'size':8})
pl.savefig('/tmp/C123.png')

A1 and A2 are arrays containing temperature readings from different sensors. I want to find a correlation between the two and show that graphically. However, occasionally, sensor-read-errors occur. And in such a case a NaN is inserted in one of the files instead of a temperature value. Then the np.polyfit refuses to fit the data and returns [nan, nan] as a result. All else fails after that as well.

My question: How can I convince numpy.polyfit to ignore the NaN values? N.B.: Datasets are relatively small at the moment. I expect that they may grow to about 200k...600k elements once deployed.

TomCho · Accepted Answer

I know this is a little old, but if you have arrays that have NaNs in them, you have to "clean them up" by only considering the indexes that are finite. The way to do this is

idx = np.isfinite(x) & np.isfinite(y)
ab = np.polyfit(x[idx], y[idx], 1)

That way you pass only the "good" points to polyfit.

numpy.polyfit doesn't handle NaN values

Tags:

python

nan

numpy

Mausy5043

1 Answers

TomCho

Recent Activity

Donate For Us

numpy.polyfit doesn't handle NaN values

Tags:

python

nan

numpy

Mausy5043

1 Answers

TomCho

Related questions

Recent Activity

Donate For Us