Adding StandardScaler() of values as new column to DataFrame returns partly NaNs

Question

I have a pandas DataFrame:

df['total_price'].describe()

returns

count    24895.000000
mean       216.377369
std        161.246931
min          0.000000
25%        109.900000
50%        174.000000
75%        273.000000
max       1355.900000
Name: total_price, dtype: float64

When I apply preprocessing.StandardScaler() to it:

x = df[['total_price']]
standard_scaler = preprocessing.StandardScaler()
x_scaled = standard_scaler.fit_transform(x)
df['new_col'] = pd.DataFrame(x_scaled)

<y new column with the standardized values contains some NaNs:

df[['total_price', 'new_col']].head()

    total_price new_col
0   241.95      0.158596
1   241.95      0.158596
2   241.95      0.158596
3   81.95      -0.833691
4   81.95      -0.833691

df[['total_price', 'new_col']].tail()

        total_price new_col
28167   264.0       NaN
28168   264.0       NaN
28176   94.0        NaN
28177   166.0       NaN
28178   166.0       NaN

What's going wrong here?

iacob · Accepted Answer

The indices in your dataframe have gaps:

When you call pd.DataFrame(x_scaled) you are creating a new contiguous index and hence when assigining this as a column in the original dataframe, many lines will not have a match. You can resolve this by resetting the index in the original dataframe (df.reset_index()) or by updating x inplace (x.update(x_scaled)).

Adding StandardScaler() of values as new column to DataFrame returns partly NaNs

Tags:

python

pandas

nan

scikit-learn

zinyosrim

1 Answers

iacob

Recent Activity

Donate For Us

Adding StandardScaler() of values as new column to DataFrame returns partly NaNs

Tags:

python

pandas

nan

scikit-learn

zinyosrim

1 Answers

iacob

Related questions

Recent Activity

Donate For Us