I'd like to normalize my training set before passing it to my NN so instead of doing it manually (subtract mean and divide by std), I tried keras.utils.normalize()
and I am amazed about the results I got.
Running this:
r = np.random.rand(3000) * 1000
nr = normalize(r)
print(np.mean(r))
print(np.mean(nr))
print(np.std(r))
print(np.std(nr))
print(np.min(r))
print(np.min(nr))
print(np.max(r))
print(np.max(nr))
Results in that:
495.60440066771866
0.015737914577213984
291.4440194021
0.009254802974329002
0.20755517410064872
6.590913227674956e-06
999.7631481267636
0.03174747238214018
Unfortunately, the docs don't explain what's happening under the hood. Can you please explain what it does and if I should use keras.utils.normalize
instead of what I would have done manually?
The normalize function just performs a regular normalization to improve performance: Normalization is a rescaling of the data from the original range so that all values are within the range of 0 and 1. There is a nice explanation of the axis argument in another post: What is the meaning of axis=-1 in keras.
Normalization classA preprocessing layer which normalizes continuous features. This layer will shift and scale inputs into a distribution centered around 0 with standard deviation 1. It accomplishes this by precomputing the mean and variance of the data, and calling (input - mean) / sqrt(var) at runtime.
Tensorflow normalize is the method available in the tensorflow library that helps to bring out the normalization process for tensors in neural networks. The main purpose of this process is to bring the transformation so that all the features work on the same or similar level of scale.
“Normalizing” a vector most often means dividing by a norm of the vector. It also often refers to rescaling by the minimum and range of the vector, to make all the elements lie between 0 and 1 thus bringing all the values of numeric columns in the dataset to a common scale.
It is not the kind of normalization you expect. Actually, it uses np.linalg.norm()
under the hood to normalize the given data using Lp-norms:
def normalize(x, axis=-1, order=2):
"""Normalizes a Numpy array.
# Arguments
x: Numpy array to normalize.
axis: axis along which to normalize.
order: Normalization order (e.g. 2 for L2 norm).
# Returns
A normalized copy of the array.
"""
l2 = np.atleast_1d(np.linalg.norm(x, order, axis))
l2[l2 == 0] = 1
return x / np.expand_dims(l2, axis)
For example, in the default case, it would normalize the data using L2-normalization (i.e. the sum of squared of elements would be equal to one).
You can either use this function, or if you don't want to do mean and std normalization manually, you can use StandardScaler()
from sklearn or even MinMaxScaler()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With