Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing zero mean and unit variance in numpy

I am given a definition of a function and asked to implement it as follows:

# Problem 1 - Apply zero mean and zero variance scale to the image features
def normalize(data):
    pass

Then provided with a unit test using numpy that would assert the success of my implementation

EDIT

This not my unit test, but assigned by the instructor of the course.

np.testing.assert_array_almost_equal(
    normalize(np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])),
    np.array([-0.4, -0.3, -0.2, -0.099, 0.0, 0.099, 0.199, 0.3, 0.4, 0.5]),
    decimal=3)

My solution is

def normalize(data):
    return ((data - data.mean()) / data.max() - data.min())

But there must be a gap in my understanding because I'm getting

AssertionError: 
Arrays are not almost equal to 3 decimals

(mismatch 100.0%)
 x: array([-1.45, -1.35, -1.25, -1.15, -1.05, -0.95, -0.85, -0.75, -0.65, -0.55])
 y: array([-0.4  , -0.3  , -0.2  , -0.099,  0.   ,  0.099,  0.199,  0.3  ,
        0.4  ,  0.5  ])

With googling I'm also finding

(data - data.mean()) / data.std()

After more searching I attempted

(data - data.mean()) / data.var()

But neither solution asserts correctly.

So what's the correct implementation here?

like image 406
Sam Hammamy Avatar asked Apr 23 '26 17:04

Sam Hammamy


1 Answers

First you probably intended to do this:

(data - data.mean()) / (data.max() - data.min())

instead of this:

((data - data.mean()) / data.max() - data.min())

Then it's is just not the correct definition for normalising. Instead, as you found by searching around, the correct definition is:

(data - data.mean()) / data.std()

And finally your unit test is just wrong, so it fails when called with the correct definition.

like image 158
Julien Avatar answered Apr 25 '26 06:04

Julien