Say I have a numpy array that has some float('nan'), I don't want to impute those data now and I want to first normalize those and keep the NaN data at the original space, is there any way I can do that?
Previously I used normalize
function in sklearn.Preprocessing
, but that function seems can't take any NaN contained array as input.
isnan() Remove NaN values from a given NumPy. Combining the ~ operator instead of numpy. logical_not() with numpy. isnan() function.
To normalize a 2D-Array or matrix we need NumPy library. For matrix, general normalization is using The Euclidean norm or Frobenius norm. Here, v is the matrix and |v| is the determinant or also called The Euclidean norm. v-cap is the normalized matrix.
How to drop all missing values from a numpy array? Droping the missing values or nan values can be done by using the function "numpy. isnan()" it will give us the indexes which are having nan values and when combined with other function which is "numpy. logical_not()" where the boolean values will be reversed.
To remove NaN from a list using Python, the easiest way is to use the isnan() function from the Python math module and list comprehension. You can also use the Python filter() function. The Python numpy module also provides an isnan() function that we can use to check if a value is NaN.
You can mask your array using the numpy.ma.array
function and subsequently apply any numpy
operation:
import numpy as np
a = np.random.rand(10) # Generate random data.
a = np.where(a > 0.8, np.nan, a) # Set all data larger than 0.8 to NaN
a = np.ma.array(a, mask=np.isnan(a)) # Use a mask to mark the NaNs
a_norm = a / np.sum(a) # The sum function ignores the masked values.
a_norm2 = a / np.std(a) # The std function ignores the masked values.
You can still access your raw data:
print a.data
You can use numpy.nansum
to compute the norm and ignore nan:
In [54]: x
Out[54]: array([ 1., 2., nan, 3.])
Here's the norm with nan
ignored:
In [55]: np.sqrt(np.nansum(np.square(x)))
Out[55]: 3.7416573867739413
y
is the normalized array:
In [56]: y = x / np.sqrt(np.nansum(np.square(x)))
In [57]: y
Out[57]: array([ 0.26726124, 0.53452248, nan, 0.80178373])
In [58]: np.linalg.norm(y[~np.isnan(y)])
Out[58]: 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With