What are Mutual_info_regression and Mutual_info_classif used for in scikit-learn?

Question

I'm learning about feature selection of big datasets. I came across methods called Mutual_info_regression() and Mutual_info_classif().

They return a value for all the features. What does that value represent?

yatu · Accepted Answer

They both measure the mutual information between a matrix containing a set of feature vectors and the target. They are under sklearn.feature_selection, since the mutual information can be used to gain some understanding on how good of a predictor a feature may be. This is a core concept in information theory, which is closely linked to that of entropy, which I would suggest you to start with. But in short, the mutual information between two variables, measures how much a given feature can explain another (target), or more technically, how much information about the target will variable will be obtained by having observed a feature.

This is in fact, the measure that internally decision trees trained through the Iterative Dichotomiser 3 use to decide which feature to set as node in each split, and the subsequent thresholds to set. The only difference between both methods is that one works for discrete targets, and the other for continuous targets.

What are Mutual_info_regression and Mutual_info_classif used for in scikit-learn?

Tags:

python

machine-learning

scikit-learn

Jay Puri Goswami

1 Answers

yatu

Recent Activity

Donate For Us

What are Mutual_info_regression and Mutual_info_classif used for in scikit-learn?

Tags:

python

machine-learning

scikit-learn

Jay Puri Goswami

1 Answers

yatu

Related questions

Recent Activity

Donate For Us