Jiawei Han's book on Data Mining 2nd edition (Attribute Selection Measures - pp 297 thru 300) explains how to calculate information gain achieved by each attribute (age, income, credit_rating) and class (buys_computer yes or no). In this example, each of the attribute values is discrete, for e.g. age can be youth/middle-aged/senior, income can be high/low/medium, credit_rating fair/excellent etc. I would like to know how the same information gain can be applied to attributes which take non discrete data. For e.g. the income attribute takes any currency amount like 100.68, 120.90, etc etc. If there are 1000 students, there could be 1000 different amount values. How can we apply the same information gain over non discrete data? Any tutorial/sample example/video url would be of great help.

Most common way for to do splitting for continuous variable (1d) is picking a threshold (from discretized set of thresholds, or you can choose a prior). So you can compute information gain for continuous value by first sorting it (you have to have an order) and then scanning it for the best value. http://dilekylmzr.files.wordpress.com/2011/09/data-mining_lecture9.ppt Example of using this technique in random forests Often this technique is used in random forests (or decision trees), so I will post few references to resources on that. More information on random forests and this technique can be found here : http://www.cs.ubc.ca/~nando/540-2013/lectures.html . See lectures on youtube because slides are not very much informative. In the lecture it is described how to match body parts using random forests in Kinect, so it is quite interesting. Also you can look it up here : https://research.microsoft.com/pubs/145347/bodypartrecognition.pdf - the original paper being discussed in the lecture. Note that for information gain you can use also gaussian entropy. It is basically fitting gaussian to data before and after split.

Information gain on non discrete dataset

Tags:

machine-learning

data-mining

Jiawei Han's book on Data Mining 2nd edition (Attribute Selection Measures - pp 297 thru 300) explains how to calculate information gain achieved by each attribute (age, income, credit_rating) and class (buys_computer yes or no). In this example, each of the attribute values is discrete, for e.g. age can be youth/middle-aged/senior, income can be high/low/medium, credit_rating fair/excellent etc.

I would like to know how the same information gain can be applied to attributes which take non discrete data. For e.g. the income attribute takes any currency amount like 100.68, 120.90, etc etc. If there are 1000 students, there could be 1000 different amount values.

How can we apply the same information gain over non discrete data? Any tutorial/sample example/video url would be of great help.

990

asked Oct 04 '13 23:10

blue piranha

2 Answers

When your target variable is discrete (categorical), you just calculate entropy over the empirical distribution of categories in the left/right split you're considering, and compare their weighted average to the entropy without the split.

For a continuous target variable, like income, this is defined analogously as differential entropy. For your purpose you would assume that the values in your set have a normal distribution, and calculate the differential entropy accordingly. From Wikipedia:

enter image description here

That is it's just a function of the variance of the values. Note that this is in nats, not bits of entropy. To compare to Shannon entropy above, you'd have to convert, which is just a multiplication.

122

answered Oct 23 '22 16:10

Sean Owen

Most common way for to do splitting for continuous variable (1d) is picking a threshold (from discretized set of thresholds, or you can choose a prior). So you can compute information gain for continuous value by first sorting it (you have to have an order) and then scanning it for the best value. http://dilekylmzr.files.wordpress.com/2011/09/data-mining_lecture9.ppt

Example of using this technique in random forests

Often this technique is used in random forests (or decision trees), so I will post few references to resources on that.

More information on random forests and this technique can be found here : http://www.cs.ubc.ca/~nando/540-2013/lectures.html . See lectures on youtube because slides are not very much informative. In the lecture it is described how to match body parts using random forests in Kinect, so it is quite interesting. Also you can look it up here : https://research.microsoft.com/pubs/145347/bodypartrecognition.pdf - the original paper being discussed in the lecture.

Note that for information gain you can use also gaussian entropy. It is basically fitting gaussian to data before and after split.

answered Oct 23 '22 15:10

kudkudak

Related questions
                            
                                Feature Selection in PySpark
                            
                                What is the difference between from sklearn.model_selection import train_test_split and from sklearn.cross_validation import train_test_split
                            
                                References for data normalization
                            
                                Using Sentiwordnet 3.0
                            
                                Scala Support Vector Machine library
                            
                                Simple Neural Network can't learn XOR
                            
                                Implement K Neighbors Classifier in scikit-learn with 3 feature per object
                            
                                Getting the class labels from an sklearn.svm.LinearSVC object
                            
                                Write custom classifier in R and predict function
                            
                                Implementation of Gaussian Process Regression in Python y(n_samples, n_targets)
                            
                                Understanding influence of random start weights on neural network performance
                            
                                Cross Validation--Use testing set or validation set to predict?
                            
                                Is there a momentum option for Adam optimizer in Keras? [closed]
                            
                                How to give variable size images as input in keras
                            
                                Tensorflow flatten vs numpy flatten function effect on machine learning training
                            
                                "TypeError: Singleton array cannot be considered a valid collection" using sklearn train_test_split
                            
                                How to classify sequence of images with keras deep learning
                            
                                Gridsearchcv vs Bayesian optimization
                            
                                How to combine False positives and false negatives into one single measure
                            
                                A question about classifiers in Machine Learning

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With