Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can someone explain MaxAbsScaler in Scikit-learn?

I was reading the docs for MaxAbsScaler.

https://scikit-learn.org/stable/modules/preprocessing.html#scaling-features-to-a-range

I can't understand what exactly it does.

Here is an example:

>>> X_train = np.array([[ 1., -1.,  2.],
...                     [ 2.,  0.,  0.],
...                     [ 0.,  1., -1.]])
...
>>> max_abs_scaler = preprocessing.MaxAbsScaler()
>>> X_train_maxabs = max_abs_scaler.fit_transform(X_train)
>>> X_train_maxabs                # doctest +NORMALIZE_WHITESPACE^
array([[ 0.5, -1. ,  1. ],
       [ 1. ,  0. ,  0. ],
       [ 0. ,  1. , -0.5]])
>>> X_test = np.array([[ -3., -1.,  4.]])
>>> X_test_maxabs = max_abs_scaler.transform(X_test)
>>> X_test_maxabs                 
array([[-1.5, -1. ,  2. ]])
>>> max_abs_scaler.scale_         
array([2.,  1.,  2.])

It says that it scales in a way that the training data lies within the range [-1, 1] by dividing through the largest maximum value in each feature.

I think it works per column when it says in each feature.

A simpler explanation would be great.

like image 241
user12200428 Avatar asked Oct 14 '19 07:10

user12200428


1 Answers

The function Scales each feature by its maximum absolute value. Feature here it's each column of the X input matrix.


Here you have:

X_train = np.array([[ 1., -1.,  2.],
                    [ 2.,  0.,  0.],
                    [ 0.,  1., -1.]])

and you get:

array([[ 0.5, -1. ,  1. ],
       [ 1. ,  0. ,  0. ],
       [ 0. ,  1. , -0.5]])

Train set Explanation:

The first feature in X_train is the first column i.e. [1,2,0]. The maximum absolute value is 2. Then you divide all values of this column by 2. So the new column becomes [0.5,1,0]

Similarly you do the same thing for the other 2 features/columns. For feature 2, the maximum absolute value is 1. So the new column remains the same.

Finally, for the last feature you have a maximum absolute value of 2. So the final feature becomes [2/2 , 0/2 , -1/2] = [1, 0, -0.5].


Test set Explanation

Next, you define X_test = np.array([[ -3., -1., 4.]]). Here you have one sample with 3 features.

IMPORTANT: The scaler was trained using the training set and will use the maximum absolute values of the training set.

So you get: [ -3./2, -1./1, 4./2] = [-1.5, -1. , 2. ]

P.S: The values 2,1 and 2 that are used for the division are coming from th eestimation using the training set.

like image 61
seralouk Avatar answered Oct 03 '22 18:10

seralouk