Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalise 2D Numpy Array: Zero Mean Unit Variance

Tags:

python

numpy

I have a 2D Numpy array, in which I want to normalise each column to zero mean and unit variance. Since I'm primarily used to C++, the method in which I'm doing is to use loops to iterate over elements in a column and do the necessary operations, followed by repeating this for all columns. I wanted to know about a pythonic way to do so.

Let class_input_data be my 2D array. I can get the column mean as:

column_mean = numpy.sum(class_input_data, axis = 0)/class_input_data.shape[0]

I then subtract the mean from all columns by:

class_input_data = class_input_data - column_mean

By now, the data should be zero mean. However, the value of:

numpy.sum(class_input_data, axis = 0)

isn't equal to 0, implying that I have done something wrong in my normalisation. By isn't equal to 0, I don't mean very small numbers which can be attributed to floating point inaccuracies.

like image 400
therainmaker Avatar asked Jul 01 '15 05:07

therainmaker


People also ask

How do you normalize data to zero mean and unit variance?

You can determine the mean of the signal, and just subtract that value from all the entries. That will give you a zero mean result. To get unit variance, determine the standard deviation of the signal, and divide all entries by that value.

How do you normalize a NumPy array to a unit vector?

You can normalize a NumPy array to a unit vector using the sklearn. normalize() method. When using the array of data in machine learning, you can only pass the normalized values to the algorithms to achieve better accuracy. A unit vector is a vector that has a magnitude of 1 .

How do you normalize a 2D array?

To normalize a 2D-Array or matrix we need NumPy library. For matrix, general normalization is using The Euclidean norm or Frobenius norm. Here, v is the matrix and |v| is the determinant or also called The Euclidean norm. v-cap is the normalized matrix.

How do you normalize an array so the values range exactly between 0 and 1?

You can normalize data between 0 and 1 range by using the formula (data – np. min(data)) / (np. max(data) – np. min(data)) .


1 Answers

Something like:

import numpy as np

eg_array = 5 + (np.random.randn(10, 10) * 2)
normed = (eg_array - eg_array.mean(axis=0)) / eg_array.std(axis=0)

normed.mean(axis=0)
Out[14]: 
array([  1.16573418e-16,  -7.77156117e-17,  -1.77635684e-16,
         9.43689571e-17,  -2.22044605e-17,  -6.09234885e-16,
        -2.22044605e-16,  -4.44089210e-17,  -7.10542736e-16,
         4.21884749e-16])

normed.std(axis=0)
Out[15]: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
like image 108
Marius Avatar answered Sep 18 '22 15:09

Marius