Python 3.5, preprocessing from sklearn
df = quandl.get('WIKI/GOOGL')
X = np.array(df)
X = preprocessing.scale(X)
The scale() function is an inbuilt function in the Python Wand ImageMagick library which is used to change the image size by scaling each pixel value by given columns and rows. Syntax: scale(columns, rows)
The sklearn. preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. In general, learning algorithms benefit from standardization of the data set.
Feature scaling is generally the last step in the data preprocessing pipeline, performed just before training the machine learning algorithms.
To apply standard scaling with Python, you can use the StandardScaler class from the sklearn. preprocessing module. You need to call the fit_transform() method from the StandardScaler class and pass it your Pandas Dataframe containing the features you want scaled.
The preprocessing.scale() algorithm puts your data on one scale. This is helpful with largely sparse datasets. In simple words, your data is vastly spread out. For example the values of X maybe like so:
X = [1, 4, 400, 10000, 100000]
The issue with sparsity is that it very biased or in statistical terms skewed. So, therefore, scaling the data brings all your values onto one scale eliminating the sparsity. In regards to know how it works in mathematical detail, this follows the same concept of Normalization and Standardization. You can do research on those to find out how it works in detail. But to make life simpler the sklearn algorithm does everything for you !
Scaling the data brings all your values onto one scale eliminating the sparsity and it follows the same concept of Normalization and Standardization. To see the effect, you can call describe on the dataframe before and after processing:
df.describe()
#with X is already pre-proccessed
df2 = pandas.DataFrame(X)
df2.describe()
You will see df2 has 0 mean and the standard variation of 1 in each field.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With