How to disregard the NaN data point in numpy array and generate the normalized data in Python?

Say I have a numpy array that has some float('nan'), I don't want to impute those data now and I want to first normalize those and keep the NaN data at the original space, is there any way I can do that?

Previously I used normalize function in sklearn.Preprocessing, but that function seems can't take any NaN contained array as input.

How do I skip NaN values in NumPy array?

isnan() Remove NaN values from a given NumPy. Combining the ~ operator instead of numpy. logical_not() with numpy. isnan() function.

How do I normalize data in Python using NumPy?

To normalize a 2D-Array or matrix we need NumPy library. For matrix, general normalization is using The Euclidean norm or Frobenius norm. Here, v is the matrix and |v| is the determinant or also called The Euclidean norm. v-cap is the normalized matrix.

How do you remove NaN values from an array?

How to drop all missing values from a numpy array? Droping the missing values or nan values can be done by using the function "numpy. isnan()" it will give us the indexes which are having nan values and when combined with other function which is "numpy. logical_not()" where the boolean values will be reversed.

How do you exclude NaN values in Python?

To remove NaN from a list using Python, the easiest way is to use the isnan() function from the Python math module and list comprehension. You can also use the Python filter() function. The Python numpy module also provides an isnan() function that we can use to check if a value is NaN.

You can mask your array using the numpy.ma.array function and subsequently apply any numpy operation:

import numpy as np

a = np.random.rand(10)            # Generate random data.
a = np.where(a > 0.8, np.nan, a)  # Set all data larger than 0.8 to NaN

a = np.ma.array(a, mask=np.isnan(a)) # Use a mask to mark the NaNs

a_norm  = a / np.sum(a) # The sum function ignores the masked values.
a_norm2 = a / np.std(a) # The std function ignores the masked values.

You can still access your raw data:

print a.data

You can use numpy.nansum to compute the norm and ignore nan:

In [54]: x
Out[54]: array([  1.,   2.,  nan,   3.])

Here's the norm with nan ignored:

In [55]: np.sqrt(np.nansum(np.square(x)))
Out[55]: 3.7416573867739413

y is the normalized array:

In [56]: y = x / np.sqrt(np.nansum(np.square(x)))

In [57]: y
Out[57]: array([ 0.26726124,  0.53452248,         nan,  0.80178373])

In [58]: np.linalg.norm(y[~np.isnan(y)])
Out[58]: 1.0

How to disregard the NaN data point in numpy array and generate the normalized data in Python?

Tags:

python

numpy

scipy

scikit-learn

xxx222

People also ask

2 Answers

Chiel

Warren Weckesser

Recent Activity

Donate For Us

How to disregard the NaN data point in numpy array and generate the normalized data in Python?

Tags:

python

numpy

scipy

scikit-learn

xxx222

People also ask

2 Answers

Chiel

Warren Weckesser

Related questions

Recent Activity

Donate For Us