I have a dataframe like:
TOTAL | Name
3232 Jane
382 Jack
8291 Jones
I'd like to create a newly scaled column in the dataframe called SIZE
where SIZE
is a number between 5 and 50.
For Example:
TOTAL | Name | SIZE
3232 Jane 24.413
382 Jack 10
8291 Jones 50
I've tried
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
scaler=MinMaxScaler(feature_range=(10,50))
df["SIZE"]=scaler.fit_transform(df["TOTAL"])
but got Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I've tried other things, such as creating a list, transforming it, and appending it back to the dataframe, among other things.
What is the easiest way to do this?
Thanks!
Option 1sklearn
You see this problem time and time again, the error really should be indicative of what you need to do. You're basically missing a superfluous dimension on the input. Change df["TOTAL"]
to df[["TOTAL"]]
.
df['SIZE'] = scaler.fit_transform(df[["TOTAL"]])
df
TOTAL Name SIZE
0 3232 Jane 24.413959
1 382 Jack 10.000000
2 8291 Jones 50.000000
Option 2pandas
Preferably, I would bypass sklearn and just do the min-max scaling myself.
a, b = 10, 50
x, y = df.TOTAL.min(), df.TOTAL.max()
df['SIZE'] = (df.TOTAL - x) / (y - x) * (b - a) + a
df
TOTAL Name SIZE
0 3232 Jane 24.413959
1 382 Jack 10.000000
2 8291 Jones 50.000000
This is essentially what the min-max scaler does, but without the overhead of importing scikit learn (don't do it unless you have to, it's a heavy library).
In case you want to scale only one column in the dataframe, you have to reshape the column values as follows:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df['SIZE'] = scaler.fit_transform(df['TOTAL'].values.reshape(-1,1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With