Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalization VS. numpy way to normalize?

I'm supposed to normalize an array. I've read about normalization and come across a formula:

enter image description here

I wrote the following function for it:

def normalize_list(list):
    max_value = max(list)
    min_value = min(list)
    for i in range(0, len(list)):
        list[i] = (list[i] - min_value) / (max_value - min_value)

That is supposed to normalize an array of elements.

Then I have come across this: https://stackoverflow.com/a/21031303/6209399 Which says you can normalize an array by simply doing this:

def normalize_list_numpy(list):
    normalized_list = list / np.linalg.norm(list)
    return normalized_list

If I normalize this test array test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9] with my own function and with the numpy method, I get these answers:

My own function: [0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]
The numpy way: [0.059234887775909233, 0.11846977555181847, 0.17770466332772769, 0.23693955110363693, 0.29617443887954614, 0.35540932665545538, 0.41464421443136462, 0.47387910220727386, 0.5331139899831830

Why do the functions give different answers? Is there others way to normalize an array of data? What does numpy.linalg.norm(list) do? What do I get wrong?

like image 647
OuuGiii Avatar asked Oct 24 '17 16:10

OuuGiii


People also ask

How do I normalize in NumPy?

In order to normalize a vector in NumPy, we can use the np. linalg. norm() function, which returns the vector's norm value. We can then use the norm value to divide each value in the array to get the normalized array.

How do you normalize a dataset in Python?

Using MinMaxScaler() to Normalize Data in Python This is a more popular choice for normalizing datasets. You can see that the values in the output are between (0 and 1). MinMaxScaler also gives you the option to select feature range. By default, the range is set to (0,1).


1 Answers

The question/answer that you reference doesn't explicitly relate your own formula to the np.linalg.norm(list) version that you use here.

One NumPy solution would be this:

import numpy as np
def normalize(x):
    x = np.asarray(x)
    return (x - x.min()) / (np.ptp(x))

print(normalize(test_array))    
# [ 0.     0.125  0.25   0.375  0.5    0.625  0.75   0.875  1.   ]

Here np.ptp is peak-to-peak ie

Range of values (maximum - minimum) along an axis.

This approach scales the values to the interval [0, 1] as pointed out by @phg.

The more traditional definition of normalization would be to scale to a 0 mean and unit variance:

x = np.asarray(test_array)
res = (x - x.mean()) / x.std()
print(res.mean(), res.std())
# 0.0 1.0

Or use sklearn.preprocessing.normalize as a pre-canned function.

Using test_array / np.linalg.norm(test_array) creates a result that is of unit length; you'll see that np.linalg.norm(test_array / np.linalg.norm(test_array)) equals 1. So you're talking about two different fields here, one being statistics and the other being linear algebra.

like image 76
Brad Solomon Avatar answered Sep 29 '22 01:09

Brad Solomon