Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What preprocessing.scale() do? How does it work?

Python 3.5, preprocessing from sklearn

df = quandl.get('WIKI/GOOGL')
X = np.array(df)
X = preprocessing.scale(X)
like image 874
0x Tps Avatar asked Feb 19 '17 08:02

0x Tps


People also ask

What is scale function in Python?

The scale() function is an inbuilt function in the Python Wand ImageMagick library which is used to change the image size by scaling each pixel value by given columns and rows. Syntax: scale(columns, rows)

What does preprocessing in Sklearn do?

The sklearn. preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. In general, learning algorithms benefit from standardization of the data set.

Is scaling part of preprocessing?

Feature scaling is generally the last step in the data preprocessing pipeline, performed just before training the machine learning algorithms.

How do you scale data in Python?

To apply standard scaling with Python, you can use the StandardScaler class from the sklearn. preprocessing module. You need to call the fit_transform() method from the StandardScaler class and pass it your Pandas Dataframe containing the features you want scaled.


2 Answers

The preprocessing.scale() algorithm puts your data on one scale. This is helpful with largely sparse datasets. In simple words, your data is vastly spread out. For example the values of X maybe like so:

X = [1, 4, 400, 10000, 100000]

The issue with sparsity is that it very biased or in statistical terms skewed. So, therefore, scaling the data brings all your values onto one scale eliminating the sparsity. In regards to know how it works in mathematical detail, this follows the same concept of Normalization and Standardization. You can do research on those to find out how it works in detail. But to make life simpler the sklearn algorithm does everything for you !

like image 134
Deepak M Avatar answered Oct 27 '22 02:10

Deepak M


Scaling the data brings all your values onto one scale eliminating the sparsity and it follows the same concept of Normalization and Standardization. To see the effect, you can call describe on the dataframe before and after processing:

df.describe()

#with X is already pre-proccessed 
df2 = pandas.DataFrame(X)
df2.describe()

You will see df2 has 0 mean and the standard variation of 1 in each field.

like image 23
T D Nguyen Avatar answered Oct 27 '22 00:10

T D Nguyen