How can sklearn select categorical features based on feature selection

Question

My question is i want to run feature selection on the data with several categorical variables. I have used get_dummies in pandas to generate all the sparse matrix for these categorical variables. My question is how sklearn knows that one specific sparse matrix actually belongs to one feature and select/drop them all? For example, I have a variable called city. There are New York, Chicago and Boston three levels for that variable, so the sparse matrix looks like:

[1,0,0] [0,1,0] [0,0,1] How can I inform the sklearn that in these three "columns" actually belong to one feature, which is city and won't end up with choosing New York, and delete Chicago and Boston?

Thank you so much!

[1,0,0] [0,1,0] [0,0,1] How can I inform the sklearn that in these three "columns" actually belong to one feature, which is city and won't end up with choosing New York, and delete Chicago and Boston?

Thank you so much!

Fred Foo · Accepted Answer

You can't. The feature selection routines in scikit-learn will consider the dummy variables independently of each other. This means they can "trim" the domains of categorical variables down to the values that matter for prediction.

How can sklearn select categorical features based on feature selection

Tags:

python

scikit-learn

feature-selection

MYjx

1 Answers

Fred Foo

Recent Activity

Donate For Us

How can sklearn select categorical features based on feature selection

Tags:

python

scikit-learn

feature-selection

MYjx

1 Answers

Fred Foo

Related questions

Recent Activity

Donate For Us