Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

normalization of categorical variable

I have a dataset which contains gender as Male and Female. I have converted male to 1 and female to 0 using pandas functionality which has now data type int8. now I wanted to normalize columns such as weight and height. So what should be done with the gender column: should it be normalized or not. I am planning to use it in for a linear regression.

like image 660
Krissh Avatar asked Aug 17 '18 05:08

Krissh


People also ask

Can you normalize categorical variables?

There is no need to normalize categorical variables. You are not very explicit about the type of analysis you are doing, but typically you are dealing with the categorical variables as dummy variables in the statistical analysis.

Do we need to standardize categorical variables?

It is common practice to standardize or center variables to make the data more interpretable in simple slopes analysis; however, categorical variables should never be standardized or centered.

Should I normalize dummy variables?

Normalizing dummy variables makes no sense. Usually, normalization is used when the variables are measured on different scales such that a proper comparison is not possible.

How do you normalize variable data?

When we normalize a variable we first shift the scale so that it starts at 0, and then compress it so that it ends on 1. We do so by first subtracting the minimum value, and then divide by the new maximum value (which is the old max value minus the old min value).


1 Answers

So I think you are mixing up normalization with standardization.

Normalization:

rescales your data into a range of [0;1]

Standardization:

rescales your data to have a mean of 0 and a standard deviation of 1.

Back to your question:

For your gender column your points are already ranging between 0 and 1. Therefore your data is already "normalized". So your question should be if you can standarize your data and the answer is: yes you could, but it doesn't really make sense. This question was already discussed here: Should you ever standardise binary variables?

like image 90
Tim Avatar answered Oct 14 '22 02:10

Tim