Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Machine learning algorithm for mixed categorical and numeric features

I have a training dataset of 1000 samples. It contains about 50 features out of which 30 are categorical features where as the rest are numerical/continuous features. Which algorithm is best suited to handle mixed feature set of both categorical and continuous features?

like image 970
user3207663 Avatar asked Nov 28 '25 17:11

user3207663


1 Answers

In general, a preferred approach is to convert all your features into standardized continuous features.

  1. For features that were originally continuous, perform standardization: x_i = (x_i - mean(x)) / standard_deviation(x). That is, for each feature, subtract the mean of the feature and then divide by the standard deviation of the feature. An alternative approach is to convert the continuous features into the range [0, 1]: x_i = (x_i - min(x)) / (max(x) - min(x)).

  2. For categorical features, perform binarization on them so that each value is a continuous variable taking on the value of 0.0 or 1.0. For example, if you have a categorical variable "gender" that can take on values of MALE, FEMALE, and NA, create three binary binary variables IS_MALE, IS_FEMALE, and IS_NA, where each variable can be 0.0 or 1.0. You can then perform standardization as in step 1.

Now you have all your features as standardized continuous variables.

like image 63
stackoverflowuser2010 Avatar answered Nov 30 '25 08:11

stackoverflowuser2010



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!