Calculating entropy in decision tree (Machine learning)

Tags:

decision-tree

I do know formula for calculating entropy:

H(Y) = - ∑ (p(yj) * log2(p(yj)))

In words, select an attribute and for each value check target attribute value ... so p(yj) is the fraction of patterns at Node N are in category yj - one for true in target value and one one for false.

But I have a dataset in which target attribute is price, hence range. How to calculate entropy for this kinda dataset?

(Referred: http://decisiontrees.net/decision-trees-tutorial/tutorial-5-exercise-2/)

726

asked Jan 16 '13 16:01

code muncher

1 Answers

You first need to discretise the data set in some way, like sorting it numerically into a number of buckets. Many methods for discretisation exist, some supervised (ie taking account the value of your target function) and some not. This paper outlines various techniques used in fairly general terms. For more specifics there are plenty of discretisation algorithms in machine learning libraries like Weka.

The entropy of continuous distributions is called differential entropy, and can also be estimated by assuming your data is distributed in some way (normally distributed for example), then estimating underlaying distribution in the normal way, and using this to calculate an entropy value.

answered Nov 10 '22 05:11

Vic Smith

Related questions
                            
                                Machine Learning: sign visibility
                            
                                Dealing with class imbalance in multi-label classification
                            
                                How does the distorted_inputs() function in the TensorFlow CIFAR-10 example tutorial get 128 images per batch?
                            
                                Difference between Apache spark mllib.linalg vectors and spark.util vectors for machine learning
                            
                                "ValueError: labels ['timestamp'] not contained in axis" error
                            
                                Python scikit learn multi-class multi-label performance metrics?
                            
                                How to use `Dirichlet Process Gaussian Mixture Model` in Scikit-learn? (n_components?)
                            
                                Scikit-Learn SVR Prediction Always Gives the Same Value
                            
                                Max over time pooling in Keras
                            
                                Can one only implement gradient descent like optimizers with the code example from processing gradients in TensorFlow?
                            
                                How to know Tensorflow Lite model's input/output feature info?
                            
                                Stacking RBMs to create Deep belief network in sklearn
                            
                                Differences between sklearn's SimpleImputer and Imputer
                            
                                What if the sample size is not divisible by batch_size in Keras model
                            
                                What is the difference between return state and return sequence in a keras GRU layer?
                            
                                Any examples for Numpy asanyarray vs asarray?
                            
                                PyTorch torch.no_grad() versus requires_grad=False
                            
                                Datasets to test Nonlinear SVM
                            
                                How to use custom classifiers in ensemble classifiers in sklearn?
                            
                                General synonym and part of speech processing using nltk

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With