I'm learning about feature selection of big datasets. I came across methods called Mutual_info_regression() and Mutual_info_classif().
They return a value for all the features. What does that value represent?
They both measure the mutual information between a matrix containing a set of feature vectors and the target. They are under sklearn.feature_selection, since the mutual information can be used to gain some understanding on how good of a predictor a feature may be. This is a core concept in information theory, which is closely linked to that of entropy, which I would suggest you to start with. But in short, the mutual information between two variables, measures how much a given feature can explain another (target), or more technically, how much information about the target will variable will be obtained by having observed a feature.
This is in fact, the measure that internally decision trees trained through the Iterative Dichotomiser 3 use to decide which feature to set as node in each split, and the subsequent thresholds to set. The only difference between both methods is that one works for discrete targets, and the other for continuous targets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With