Machine learning algorithm for mixed categorical and numeric features

Question

I have a training dataset of 1000 samples. It contains about 50 features out of which 30 are categorical features where as the rest are numerical/continuous features. Which algorithm is best suited to handle mixed feature set of both categorical and continuous features?

stackoverflowuser2010 · Accepted Answer

In general, a preferred approach is to convert all your features into standardized continuous features.

For features that were originally continuous, perform standardization: x_i = (x_i - mean(x)) / standard_deviation(x). That is, for each feature, subtract the mean of the feature and then divide by the standard deviation of the feature. An alternative approach is to convert the continuous features into the range [0, 1]: x_i = (x_i - min(x)) / (max(x) - min(x)).
For categorical features, perform binarization on them so that each value is a continuous variable taking on the value of 0.0 or 1.0. For example, if you have a categorical variable "gender" that can take on values of MALE, FEMALE, and NA, create three binary binary variables IS_MALE, IS_FEMALE, and IS_NA, where each variable can be 0.0 or 1.0. You can then perform standardization as in step 1.

Now you have all your features as standardized continuous variables.

Machine learning algorithm for mixed categorical and numeric features

Tags:

machine-learning

user3207663

1 Answers

stackoverflowuser2010

Recent Activity

Donate For Us

Machine learning algorithm for mixed categorical and numeric features

Tags:

machine-learning

user3207663

1 Answers

stackoverflowuser2010

Related questions

Recent Activity

Donate For Us