Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scaling / Normalizing pandas column

I have a dataframe like:

TOTAL | Name
3232     Jane
382      Jack
8291     Jones

I'd like to create a newly scaled column in the dataframe called SIZE where SIZE is a number between 5 and 50.

For Example:

TOTAL | Name | SIZE
3232     Jane   24.413
382      Jack   10
8291     Jones  50

I've tried

from sklearn.preprocessing import MinMaxScaler
import pandas as pd

scaler=MinMaxScaler(feature_range=(10,50))
df["SIZE"]=scaler.fit_transform(df["TOTAL"])

but got Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I've tried other things, such as creating a list, transforming it, and appending it back to the dataframe, among other things.

What is the easiest way to do this?

Thanks!

like image 881
machump Avatar asked Apr 25 '18 17:04

machump


2 Answers

Option 1
sklearn
You see this problem time and time again, the error really should be indicative of what you need to do. You're basically missing a superfluous dimension on the input. Change df["TOTAL"] to df[["TOTAL"]].

df['SIZE'] = scaler.fit_transform(df[["TOTAL"]])

df
   TOTAL   Name       SIZE
0   3232   Jane  24.413959
1    382   Jack  10.000000
2   8291  Jones  50.000000

Option 2
pandas
Preferably, I would bypass sklearn and just do the min-max scaling myself.

a, b = 10, 50
x, y = df.TOTAL.min(), df.TOTAL.max()
df['SIZE'] = (df.TOTAL - x) / (y - x) * (b - a) + a

df
   TOTAL   Name       SIZE
0   3232   Jane  24.413959
1    382   Jack  10.000000
2   8291  Jones  50.000000

This is essentially what the min-max scaler does, but without the overhead of importing scikit learn (don't do it unless you have to, it's a heavy library).

like image 149
cs95 Avatar answered Nov 19 '22 02:11

cs95


In case you want to scale only one column in the dataframe, you have to reshape the column values as follows:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df['SIZE'] = scaler.fit_transform(df['TOTAL'].values.reshape(-1,1))
like image 6
Yahia Avatar answered Nov 19 '22 02:11

Yahia