Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy masked arrays - indicating missing values

import numpy as np
import numpy.ma as ma

"""This operates as expected with one value masked"""
a = [0., 1., 1.e20, 9.]
error_value = 1.e20
b = ma.masked_values(a, error_value)
print b

"""This does not, all values are masked """
d = [0., 1., 'NA', 9.]
error_value = 'NA'
e = ma.masked_values(d, error_value)
print e

How can I use 'nan', 'NA', 'None', or some similar value to indicate missing data?

like image 366
Dick Eshelman Avatar asked Jul 02 '11 04:07

Dick Eshelman


People also ask

What represents the missing value in Numpy?

Although the NumPy's float uses NaN value to represent a missing value, these new extension dtypes are now inline with the already existing nullable-integer and -boolean dtypes. See below for an example that shows the nullable-float dtype Float64 at work, Figure 8: Illustrates a dataframe construction using the pd.

Can Numpy handle missing data?

Working With Missing ValuesNumPy will gain a global singleton called numpy.NA, similar to None, but with semantics reflecting its status as a missing value. In particular, trying to treat it as a boolean will raise an exception, and comparisons with it will produce numpy.NA instead of True or False.

How do I fill missing values in Numpy array?

In NumPy, to replace missing values NaN ( np. nan ) in ndarray with other numbers, use np. nan_to_num() or np. isnan() .

How do you read a masked array in Python?

Accessing the data The underlying data of a masked array can be accessed in several ways: through the data attribute. The output is a view of the array as a numpy. ndarray or one of its subclasses, depending on the type of the underlying data at the masked array creation.


1 Answers

Are you getting your data from a text file or similar? If so, I'd suggest using the genfromtxt function directly to specify your masked value:

In [149]: f = StringIO('0.0, 1.0, NA, 9.0')

In [150]: a = np.genfromtxt(f, delimiter=',', missing_values='NA', usemask=True)

In [151]: a
Out[151]:
masked_array(data = [0.0 1.0 -- 9.0],
             mask = [False False  True False],
       fill_value = 1e+20)

I think the problem in your example is that the python list you're using to initialize the numpy array has heterogeneous types (floats and a string). The values are coerced to a strings in a numpy array, but the masked_values function uses floating point equality yielding the strange results.

Here's one way to overcome this by creating an array with object dtype:

In [152]: d = np.array([0., 1., 'NA', 9.], dtype=object)

In [153]: e = ma.masked_values(d, 'NA')

In [154]: e
Out[154]:
masked_array(data = [0.0 1.0 -- 9.0],
             mask = [False False  True False],
       fill_value = ?)

You may prefer the first solution since the result has a float dtype.

like image 58
ars Avatar answered Nov 14 '22 14:11

ars