How to scale dataframes consistently MinMaxScaler() sklearn

Tags:

I have three data frames that are each scaled individually with MinMaxScaler().

def scale_dataframe(values_to_be_scaled)
    values = values_to_be_scaled.astype('float64')
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled = scaler.fit_transform(values)

    return scaled

scaled_values = []
for i in range(0,num_df):
    scaled_values.append(scale_dataframe(df[i].values))

The problem I am having is that each dataframe gets scaled according to its own individual set of column min and max values. I need all of my dataframes to scale to the same values as if they all shared the same set of column min and max values for the data overall. Is there a way to accomplish this with MinMaxScaler()? One option would be to make one large dataframe, then scale the dataframe before partitioning, but this would not be ideal.

613

asked Dec 09 '17 19:12

xjtc55

1 Answers

Check out the excellent docs of sklearn.

As you see, there is support for partial_fit()! This allows online-scaling/minibatch-scaling and you can control the minibatches!

Example:

import numpy as np
from sklearn.preprocessing import MinMaxScaler

a = np.array([[1,2,3]])
b = np.array([[10,20,30]])
c = np.array([[5, 10, 15]])

""" Scale on all datasets together in one batch """
offline_scaler = MinMaxScaler()
offline_scaler.fit(np.vstack((a, b, c)))                # fit on whole data at once
a_offline_scaled = offline_scaler.transform(a)
b_offline_scaled = offline_scaler.transform(b)
c_offline_scaled = offline_scaler.transform(c)
print('Offline scaled')
print(a_offline_scaled)
print(b_offline_scaled)
print(c_offline_scaled)

""" Scale on all datasets together in minibatches """
online_scaler = MinMaxScaler()
online_scaler.partial_fit(a)                            # partial fit 1
online_scaler.partial_fit(b)                            # partial fit 2
online_scaler.partial_fit(c)                            # partial fit 3
a_online_scaled = online_scaler.transform(a)
b_online_scaled = online_scaler.transform(b)
c_online_scaled = online_scaler.transform(c)
print('Online scaled')
print(a_online_scaled)
print(b_online_scaled)
print(c_online_scaled)

Output:

Offline scaled
[[ 0.  0.  0.]]
[[ 1.  1.  1.]]
[[ 0.44444444  0.44444444  0.44444444]]
Online scaled
[[ 0.  0.  0.]]
[[ 1.  1.  1.]]
[[ 0.44444444  0.44444444  0.44444444]]

147

answered Oct 02 '22 15:10

sascha

Related questions
                            
                                Python - Turn a file content into a binary array
                            
                                How to dynamically generate marshmallow schemas for SQLAlchemy models
                            
                                MyPy: what is the type of a requests object?
                            
                                Pandas: assigning columns with multiple conditions and date thresholds
                            
                                Access Flask session with ReactJS
                            
                                Matplotlib - add titles to the legend rows
                            
                                Pandas: How to store cProfile output in a pandas DataFrame?
                            
                                python regex: duplicate names in named groups
                            
                                Scipy sparse matrix exponentiation: a**16 is slower than a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a?
                            
                                What is the best way to perform an anti-transpose in python?
                            
                                Tensorflow - Using tf.summary with 1.2 Estimator API
                            
                                Python and Selenium - get text excluding child node's text
                            
                                Python requests CA certificates as a string
                            
                                Importing with dot notation
                            
                                Use python's pty to create a live console
                            
                                Is it possible to force pandas not to convert data type when using DataFrame.replace
                            
                                Combine dictionary of dataframes into 1 single dataframe
                            
                                What is the expected input range for working with Keras VGG models?
                            
                                Returning probabilities in a classification prediction in Keras?
                            
                                "<Message: title>" needs to have a value for field "id" before this many-to-many relationship can be used.

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to scale dataframes consistently MinMaxScaler() sklearn

Tags:

python

scale

scikit-learn

xjtc55

People also ask

1 Answers

sascha

Recent Activity

Donate For Us