Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Forced conversion of non-numeric numpy arrays with NAN replacement

Consider the array

x = np.array(['1', '2', 'a'])

Tying to convert to a float array raises an exception

x.astype(np.float)
ValueError: could not convert string to float: a

Does numpy provide any efficient way to coerce this into a numeric array, replacing non-numeric values with something like NAN?

Alternatively, is there an efficient numpy function equivalent to np.isnan, but which also tests for non-numeric elements like letters?

like image 741
ChrisB Avatar asked Apr 25 '13 19:04

ChrisB


2 Answers

You can convert an array of strings into an array of floats (with NaNs) using np.genfromtxt:

In [83]: np.set_printoptions(precision=3, suppress=True)

In [84]: np.genfromtxt(np.array(['1','2','3.14','1e-3','b','nan','inf','-inf']))
Out[84]: array([ 1.   ,  2.   ,  3.14 ,  0.001,    nan,    nan,    inf,   -inf])

Here is a way to identify "numeric" strings:

In [34]: x
Out[34]: 
array(['1', '2', 'a'], 
      dtype='|S1')

In [35]: x.astype('unicode')
Out[35]: 
array([u'1', u'2', u'a'], 
      dtype='<U1')

In [36]: np.char.isnumeric(x.astype('unicode'))
Out[36]: array([ True,  True, False], dtype=bool)

Note that "numeric" means a unicode that contains only digit characters -- that is, characters that have the Unicode numeric value property. It does not include the decimal point. So u'1.3' is not considered "numeric".

like image 97
unutbu Avatar answered Oct 10 '22 08:10

unutbu


If you happen to be using pandas as well you could use the pd.to_numeric() method:

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: x = np.array(['1', '2', 'a'])

In [4]: pd.to_numeric(x, errors='coerce')
Out[4]: array([  1.,   2.,  nan])
like image 36
Bill Avatar answered Oct 10 '22 08:10

Bill