Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating Entropy

I've tried for several hours to calculate the Entropy and I know I'm missing something. Hopefully someone here can give me an idea!

EDIT: I think my formula is wrong!

CODE:

 info <- function(CLASS.FREQ){
      freq.class <- CLASS.FREQ
      info <- 0
      for(i in 1:length(freq.class)){
        if(freq.class[[i]] != 0){ # zero check in class
          entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]]))  #I calculate the entropy for each class i here
        }else{ 
          entropy <- 0
        } 
        info <- info + entropy # sum up entropy from all classes
      }
      return(info)
    }

I hope my post is clear, since it's the first time I actually post here.

This is my dataset:

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")

credit <- c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent")

student <- c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no")

income <- c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium")

age <- c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) # we change the age from categorical to numeric
like image 794
Codex Avatar asked Dec 02 '14 16:12

Codex


People also ask

Why entropy is calculated?

Entropy is the measure of disorders or randomness of the particular system. Since it depends on the initial and final state of the system, the absolute value of entropy cannot be determined. You need to consider the difference between the initial and final state to determine the change in entropy.


2 Answers

Ultimately I find no error in your code as it runs without error. The part I think you are missing is the calculation of the class frequencies and you will get your answer. Quickly running through the different objects you provide I suspect you are looking at buys.

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
freqs <- table(buys)/length(buys)
info(freqs)
[1] 0.940286

As a matter of improving your code, you can simplify this dramatically as you don't need a loop if you are provided a vector of class frequencies.

For example:

# calculate shannon-entropy
-sum(freqs * log2(freqs))
[1] 0.940286

As a side note, the function entropy.empirical is in the entropy package where you set the units to log2 allowing some more flexibility. Example:

entropy.empirical(freqs, unit="log2")
[1] 0.940286
like image 167
cdeterman Avatar answered Sep 20 '22 12:09

cdeterman


There is an another way similar to above answer but using a different function.

> buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")

> probabilities <- prop.table(table(buys))

> probabilities
buys
       no       yes 
0.3571429 0.6428571 

> -sum(probabilities*log2(probabilities))

[1] 0.940286

Also there is a built in function entropy.empirical(probabilities, unit = "log2")

like image 35
arunppsg Avatar answered Sep 18 '22 12:09

arunppsg