I have been reading about the ID3 algorithm recently and it says that the best attribute to be selected for splitting should result in the maximum information gain which can be computed with the help of the entropy.
I have written a simple python program to compute the entropy. It is shown below:
def _E(p, n):
x = (p/(p+n))
y = (n/(p+n))
return(-1* (x*math.log2(x)) -1* (y*math.log2(y)))
However suppose we have a table consisting of 10 elements as follows:
x = [1, 0, 1, 0, 0, 0, 0, 0, 0, 0]
y = [1, 1, 1, 0, 1, 0, 1, 0, 1, 0]
Where x is the attribute and y is the class. Here P(0) = 0.8 and P(1) = 0.2. The entropy will be as follows:
Entropy(x) = 0.8*_E(5, 3) + 0.2*_E(2, 0)
However the second split P(1) is perfectly classified and this results in a math error since log2(0) is negative infinity. How is the entropy calculated in such cases?
Entropy is a measure of impurity. So if a node is pure it means entropy is zero.
Have a look at this -
def information_gain(data, column, cut_point):
"""
For calculating the goodness of a split. The difference of the entropy of parent and
the weighted entropy of children.
:params:attribute_index, labels of the node t as `labels` and cut point as `cut_point`
:returns: The net entropy of partition
"""
subset1, subset2 = divide_data(data, column, cut_point)
lensub1, lensub2 = len(subset1), len(subset2)
#if the node is pure return 0 entropy
if len(subset1) == 0 or len(subset2) == 0:
return (0, subset1, subset2)
weighted_ent = (len(subset1)*entropy(subset1) + len(subset2)*entropy(subset2)) / len(data)
return ((entropy(data) - weighted_ent), subset1, subset2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With