Calculating the entropy of an attribute in the ID3 algorithm when a split is perfectly classified

Question

I have been reading about the ID3 algorithm recently and it says that the best attribute to be selected for splitting should result in the maximum information gain which can be computed with the help of the entropy.

I have written a simple python program to compute the entropy. It is shown below:

def _E(p, n):
    x = (p/(p+n))
    y = (n/(p+n))
    return(-1* (x*math.log2(x)) -1* (y*math.log2(y)))

However suppose we have a table consisting of 10 elements as follows:

x = [1, 0, 1, 0, 0, 0, 0, 0, 0, 0]

y = [1, 1, 1, 0, 1, 0, 1, 0, 1, 0]

Where x is the attribute and y is the class. Here P(0) = 0.8 and P(1) = 0.2. The entropy will be as follows:

Entropy(x) = 0.8*_E(5, 3) + 0.2*_E(2, 0)

However the second split P(1) is perfectly classified and this results in a math error since log2(0) is negative infinity. How is the entropy calculated in such cases?

hashcode55 · Accepted Answer

Entropy is a measure of impurity. So if a node is pure it means entropy is zero.

Have a look at this -

def information_gain(data, column, cut_point):
    """
    For calculating the goodness of a split. The difference of the entropy of parent and 
    the weighted entropy of children.
    :params:attribute_index, labels of the node t as `labels` and cut point as `cut_point`
    :returns: The net entropy of partition 
    """
    subset1, subset2 = divide_data(data, column, cut_point) 
    lensub1, lensub2 = len(subset1), len(subset2)  
    #if the node is pure return 0 entropy
    if len(subset1) == 0 or len(subset2) == 0:
        return (0, subset1, subset2)     
    weighted_ent = (len(subset1)*entropy(subset1) + len(subset2)*entropy(subset2)) / len(data)  
    return ((entropy(data) - weighted_ent), subset1, subset2)

Calculating the entropy of an attribute in the ID3 algorithm when a split is perfectly classified

Tags:

python

machine-learning

decision-tree

Arat254

1 Answers

hashcode55

Recent Activity

Donate For Us

Calculating the entropy of an attribute in the ID3 algorithm when a split is perfectly classified

Tags:

python

machine-learning

decision-tree

Arat254

1 Answers

hashcode55

Related questions

Recent Activity

Donate For Us