Python 3.5, preprocessing from sklearn <pre class="prettyprint"><code>df = quandl.get('WIKI/GOOGL') X = np.array(df) X = preprocessing.scale(X) </code></pre>

Scaling the data brings all your values onto one scale eliminating the sparsity and it follows the same concept of Normalization and Standardization. To see the effect, you can call describe on the dataframe before and after processing: <pre class="prettyprint"><code>df.describe() #with X is already pre-proccessed df2 = pandas.DataFrame(X) df2.describe() </code></pre> You will see df2 has 0 mean and the standard variation of 1 in each field.

What preprocessing.scale() do? How does it work?

Tags:

python

python-3.x

machine-learning

scikit-learn

Python 3.5, preprocessing from sklearn

df = quandl.get('WIKI/GOOGL')
X = np.array(df)
X = preprocessing.scale(X)

874

asked Feb 19 '17 08:02

0x Tps

2 Answers

The preprocessing.scale() algorithm puts your data on one scale. This is helpful with largely sparse datasets. In simple words, your data is vastly spread out. For example the values of X maybe like so:

X = [1, 4, 400, 10000, 100000]

The issue with sparsity is that it very biased or in statistical terms skewed. So, therefore, scaling the data brings all your values onto one scale eliminating the sparsity. In regards to know how it works in mathematical detail, this follows the same concept of Normalization and Standardization. You can do research on those to find out how it works in detail. But to make life simpler the sklearn algorithm does everything for you !

134

answered Oct 27 '22 02:10

Deepak M

Scaling the data brings all your values onto one scale eliminating the sparsity and it follows the same concept of Normalization and Standardization. To see the effect, you can call describe on the dataframe before and after processing:

df.describe()

#with X is already pre-proccessed 
df2 = pandas.DataFrame(X)
df2.describe()

You will see df2 has 0 mean and the standard variation of 1 in each field.

answered Oct 27 '22 00:10

T D Nguyen

Related questions
                            
                                Calling flask restful API resource methods
                            
                                celery missed heartbeat (on_node_lost)
                            
                                Why does backward recursion execute faster than forward recursion in python
                            
                                Python - using a shared variable in a recursive function
                            
                                python argparse - add action to subparser with no arguments?
                            
                                decorate __call__ with @staticmethod
                            
                                How to add bold and normal text in one line using drawString method in reportlab (python)
                            
                                Add to a deque being iterated in Python?
                            
                                How do you read a lambda function as a string?
                            
                                Subtracting pandas timestamps; absolute value
                            
                                PyMySQL returning old/snapshot values/not rerunning query?
                            
                                Plot pandas dataframe with subplots (subplots=True): Place legend and use tight layout
                            
                                Seaborn FacetGrid barplots and hue
                            
                                How to subtract a column of days from a column of dates in Pyspark?
                            
                                How to animate the colorbar in matplotlib
                            
                                Vertical line at the end of a CDF histogram using matplotlib
                            
                                Python regexp groups: how do I get all groups?
                            
                                How to specify the number of threads/processes for the default dask scheduler
                            
                                Pandas rolling standard deviation
                            
                                How to invert a regular expression in pandas filter function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With