How to normalize a vector to a corresponding unit vector in NumPy?

A unit vector is a vector with a magnitude of one. The numpy.linalg.norm () function can be used to normalize a vector to a corresponding unit vector. There are many functions in the numpy.linalg package that are relevant in linear algebra. To determine the norm of a vector, we can utilize the norm () function in numpy.linalg.

How to normalize a NumPy array using sklearn?

The first option we have when it comes to normalising a numpy array is sklearn.preprocessing.normalize () method that can be used to scale input vectors individually to unit norm (vector length). This is illustrated in the example shared below. Note that by default, the norm that is used to normalise the input will be set to 'l2'.

What is normalization in NumPy?

There is a magnitude and a direction to them. The transformation of a vector obtained by executing specific mathematical operations on it is known as normalization. We calculate a value called the “norm” of a vector to do normalizing. For functions related to linear algebra, NumPy has a dedicated submodule named linalg.

How to normalize a NumPy array to a unit vector?

People also ask

How do you normalize a NumPy array?

To normalize a 2D-Array or matrix we need NumPy library. For matrix, general normalization is using The Euclidean norm or Frobenius norm. Here, v is the matrix and |v| is the determinant or also called The Euclidean norm. v-cap is the normalized matrix.

How do you normalize a vector to a unit vector?

To normalize a vector, therefore, is to take a vector of any length and, keeping it pointing in the same direction, change its length to 1, turning it into what is called a unit vector. Since it describes a vector's direction without regard to its length, it's useful to have the unit vector readily accessible.

How do you find the unit vector in NumPy?

We can divide the vector by its norm to get the unit vector of the vector. We first created the vector with the numpy. array() function. We then calculated the unit vector of the vector by dividing the vector with the norm of the vector and saved the result inside the unit_vector .

If you're using scikit-learn you can use sklearn.preprocessing.normalize:

import numpy as np
from sklearn.preprocessing import normalize

x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True

I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.

import numpy as np

def normalized(a, axis=-1, order=2):
    l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
    l2[l2==0] = 1
    return a / np.expand_dims(l2, axis)

A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))

print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))

This might also work for you

import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))

but fails when v has length 0.

In that case, introducing a small constant to prevent the zero division solves this.

You can specify ord to get the L1 norm. To avoid zero division I use eps, but that's maybe not great.

def normalize(v):
    norm=np.linalg.norm(v, ord=1)
    if norm==0:
        norm=np.finfo(v.dtype).eps
    return v/norm

If you have multidimensional data and want each axis normalized to its max or its sum:

def normalize(_d, to_sum=True, copy=True):
    # d is a (n x dimension) np array
    d = _d if not copy else np.copy(_d)
    d -= np.min(d, axis=0)
    d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
    return d

Uses numpys peak to peak function.

a = np.random.random((5, 3))

b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1

c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1

You mentioned sci-kit learn, so I want to share another solution.

sci-kit learn `MinMaxScaler`

In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.

It also deal with NaN issues for us.

NaNs are treated as missing values: disregarded in fit, and maintained in transform. ... see reference [1]

Code sample

The code is simple, just type

# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)

Reference

[1] sklearn.preprocessing.MinMaxScaler

There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:

import transformations as trafo
import numpy as np

data = np.array([[1.0, 1.0, 0.0],
                 [1.0, 1.0, 1.0],
                 [1.0, 2.0, 3.0]])

print(trafo.unit_vector(data, axis=1))

Related questions
                            
                                Programmatically stop execution of python script? [duplicate]
                            
                                How to convert an int to a hex string?
                            
                                Calling parent class __init__ with multiple inheritance, what's the right way?
                            
                                How do I change the string representation of a Python class? [duplicate]
                            
                                Does Conda replace the need for virtualenv?
                            
                                How to calculate the angle between a line and the horizontal axis?
                            
                                Making a request to a RESTful API using python
                            
                                Convert tuple to list and back
                            
                                How to convert an integer to a string in any base?
                            
                                What is the global interpreter lock (GIL) in CPython?
                            
                                How does functools partial do what it does?
                            
                                Seaborn plots not showing up
                            
                                How to handle errors with boto3?
                            
                                How do I read a large csv file with pandas?
                            
                                What is the result of % in Python?
                            
                                Why use Abstract Base Classes in Python?
                            
                                Pandas DataFrame to List of Dictionaries
                            
                                How to read specific lines from a file (by line number)?
                            
                                Jupyter Notebook not saving: '_xsrf' argument missing from post
                            
                                Python try...except comma vs 'as' in except

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to normalize a NumPy array to a unit vector?

Tags:

python

numpy

statistics

scikit-learn

normalization

People also ask

sci-kit learn `MinMaxScaler`

Code sample

Recent Activity

Donate For Us

How to normalize a NumPy array to a unit vector?

Tags:

python

numpy

statistics

scikit-learn

normalization

People also ask

sci-kit learn MinMaxScaler

Code sample

Related questions

Recent Activity

Donate For Us

sci-kit learn `MinMaxScaler`