Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are Mutual_info_regression and Mutual_info_classif used for in scikit-learn?

I'm learning about feature selection of big datasets. I came across methods called Mutual_info_regression() and Mutual_info_classif().

They return a value for all the features. What does that value represent?

like image 615
Jay Puri Goswami Avatar asked Nov 15 '25 13:11

Jay Puri Goswami


1 Answers

They both measure the mutual information between a matrix containing a set of feature vectors and the target. They are under sklearn.feature_selection, since the mutual information can be used to gain some understanding on how good of a predictor a feature may be. This is a core concept in information theory, which is closely linked to that of entropy, which I would suggest you to start with. But in short, the mutual information between two variables, measures how much a given feature can explain another (target), or more technically, how much information about the target will variable will be obtained by having observed a feature.

This is in fact, the measure that internally decision trees trained through the Iterative Dichotomiser 3 use to decide which feature to set as node in each split, and the subsequent thresholds to set. The only difference between both methods is that one works for discrete targets, and the other for continuous targets.

like image 88
yatu Avatar answered Nov 17 '25 10:11

yatu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!