Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy.polyfit doesn't handle NaN values

Tags:

python

nan

numpy

I have a problem with this piece of Python-code:

import matplotlib
matplotlib.use("Agg")

import numpy as np
import pylab as pl

A1=np.loadtxt('/tmp/A1.txt',delimiter=',')
A1_extrema = [min(A1),max(A1)]
A2=np.loadtxt('/tmp/A2.txt',delimiter=',')

pl.close()
ab = np.polyfit(A1,A2,1)
print ab
fit = np.poly1d(ab)
print fit
r2 = np.corrcoef(A1,A2)[0,1]
print r2
pl.plot(A1,A2,'r.', label='TMP36 vs. DS18B20', alpha=0.7)
pl.plot(A1_extrema,fit(A1_extrema),'c-')
pl.annotate('{0}'.format(r2) , xy=(min(A1)+0.5,fit(min(A1))), size=6, color='r' )

pl.title('Sensor correlations')
pl.xlabel("T(x) [degC]")
pl.ylabel("T(y) [degC]")
pl.grid(True)
pl.legend(loc='upper left', prop={'size':8})
pl.savefig('/tmp/C123.png')

A1 and A2 are arrays containing temperature readings from different sensors. I want to find a correlation between the two and show that graphically. However, occasionally, sensor-read-errors occur. And in such a case a NaN is inserted in one of the files instead of a temperature value. Then the np.polyfit refuses to fit the data and returns [nan, nan] as a result. All else fails after that as well.

My question: How can I convince numpy.polyfit to ignore the NaN values? N.B.: Datasets are relatively small at the moment. I expect that they may grow to about 200k...600k elements once deployed.

like image 542
Mausy5043 Avatar asked Feb 21 '15 14:02

Mausy5043


1 Answers

I know this is a little old, but if you have arrays that have NaNs in them, you have to "clean them up" by only considering the indexes that are finite. The way to do this is

idx = np.isfinite(x) & np.isfinite(y)
ab = np.polyfit(x[idx], y[idx], 1)

That way you pass only the "good" points to polyfit.

like image 73
TomCho Avatar answered Sep 20 '22 06:09

TomCho