I've tried for several hours to calculate the Entropy and I know I'm missing something. Hopefully someone here can give me an idea!
EDIT: I think my formula is wrong!
CODE:
info <- function(CLASS.FREQ){
freq.class <- CLASS.FREQ
info <- 0
for(i in 1:length(freq.class)){
if(freq.class[[i]] != 0){ # zero check in class
entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]])) #I calculate the entropy for each class i here
}else{
entropy <- 0
}
info <- info + entropy # sum up entropy from all classes
}
return(info)
}
I hope my post is clear, since it's the first time I actually post here.
This is my dataset:
buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
credit <- c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent")
student <- c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no")
income <- c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium")
age <- c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) # we change the age from categorical to numeric
Entropy is the measure of disorders or randomness of the particular system. Since it depends on the initial and final state of the system, the absolute value of entropy cannot be determined. You need to consider the difference between the initial and final state to determine the change in entropy.
Ultimately I find no error in your code as it runs without error. The part I think you are missing is the calculation of the class frequencies and you will get your answer. Quickly running through the different objects you provide I suspect you are looking at buys
.
buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
freqs <- table(buys)/length(buys)
info(freqs)
[1] 0.940286
As a matter of improving your code, you can simplify this dramatically as you don't need a loop if you are provided a vector of class frequencies.
For example:
# calculate shannon-entropy
-sum(freqs * log2(freqs))
[1] 0.940286
As a side note, the function entropy.empirical
is in the entropy
package where you set the units to log2 allowing some more flexibility. Example:
entropy.empirical(freqs, unit="log2")
[1] 0.940286
There is an another way similar to above answer but using a different function.
> buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
> probabilities <- prop.table(table(buys))
> probabilities
buys
no yes
0.3571429 0.6428571
> -sum(probabilities*log2(probabilities))
[1] 0.940286
Also there is a built in function entropy.empirical(probabilities, unit = "log2")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With