Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to disregard the NaN data point in numpy array and generate the normalized data in Python?

Say I have a numpy array that has some float('nan'), I don't want to impute those data now and I want to first normalize those and keep the NaN data at the original space, is there any way I can do that?

Previously I used normalize function in sklearn.Preprocessing, but that function seems can't take any NaN contained array as input.

like image 911
xxx222 Avatar asked Jun 10 '16 13:06

xxx222


People also ask

How do I skip NaN values in NumPy array?

isnan() Remove NaN values from a given NumPy. Combining the ~ operator instead of numpy. logical_not() with numpy. isnan() function.

How do I normalize data in Python using NumPy?

To normalize a 2D-Array or matrix we need NumPy library. For matrix, general normalization is using The Euclidean norm or Frobenius norm. Here, v is the matrix and |v| is the determinant or also called The Euclidean norm. v-cap is the normalized matrix.

How do you remove NaN values from an array?

How to drop all missing values from a numpy array? Droping the missing values or nan values can be done by using the function "numpy. isnan()" it will give us the indexes which are having nan values and when combined with other function which is "numpy. logical_not()" where the boolean values will be reversed.

How do you exclude NaN values in Python?

To remove NaN from a list using Python, the easiest way is to use the isnan() function from the Python math module and list comprehension. You can also use the Python filter() function. The Python numpy module also provides an isnan() function that we can use to check if a value is NaN.


2 Answers

You can mask your array using the numpy.ma.array function and subsequently apply any numpy operation:

import numpy as np

a = np.random.rand(10)            # Generate random data.
a = np.where(a > 0.8, np.nan, a)  # Set all data larger than 0.8 to NaN

a = np.ma.array(a, mask=np.isnan(a)) # Use a mask to mark the NaNs

a_norm  = a / np.sum(a) # The sum function ignores the masked values.
a_norm2 = a / np.std(a) # The std function ignores the masked values.

You can still access your raw data:

print a.data
like image 147
Chiel Avatar answered Oct 13 '22 17:10

Chiel


You can use numpy.nansum to compute the norm and ignore nan:

In [54]: x
Out[54]: array([  1.,   2.,  nan,   3.])

Here's the norm with nan ignored:

In [55]: np.sqrt(np.nansum(np.square(x)))
Out[55]: 3.7416573867739413

y is the normalized array:

In [56]: y = x / np.sqrt(np.nansum(np.square(x)))

In [57]: y
Out[57]: array([ 0.26726124,  0.53452248,         nan,  0.80178373])

In [58]: np.linalg.norm(y[~np.isnan(y)])
Out[58]: 1.0
like image 42
Warren Weckesser Avatar answered Oct 13 '22 19:10

Warren Weckesser