I've tried for several hours to calculate the Entropy and I know I'm missing something. Hopefully someone here can give me an idea! EDIT: I think my formula is wrong! CODE: <pre class="prettyprint"><code> info <- function(CLASS.FREQ){ freq.class <- CLASS.FREQ info <- 0 for(i in 1:length(freq.class)){ if(freq.class[[i]] != 0){ # zero check in class entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]])) #I calculate the entropy for each class i here }else{ entropy <- 0 } info <- info + entropy # sum up entropy from all classes } return(info) } </code></pre> I hope my post is clear, since it's the first time I actually post here. This is my dataset: <pre class="prettyprint"><code>buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no") credit <- c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent") student <- c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no") income <- c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium") age <- c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) # we change the age from categorical to numeric </code></pre>

There is an another way similar to above answer but using a different function. <pre class="prettyprint"><code>> buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no") > probabilities <- prop.table(table(buys)) > probabilities buys no yes 0.3571429 0.6428571 > -sum(probabilities*log2(probabilities)) [1] 0.940286 </code></pre> Also there is a built in function <code>entropy.empirical(probabilities, unit = "log2")</code>

Calculating Entropy

Tags:

r

frequency

entropy

I've tried for several hours to calculate the Entropy and I know I'm missing something. Hopefully someone here can give me an idea!

EDIT: I think my formula is wrong!

CODE:

 info <- function(CLASS.FREQ){
      freq.class <- CLASS.FREQ
      info <- 0
      for(i in 1:length(freq.class)){
        if(freq.class[[i]] != 0){ # zero check in class
          entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]]))  #I calculate the entropy for each class i here
        }else{ 
          entropy <- 0
        } 
        info <- info + entropy # sum up entropy from all classes
      }
      return(info)
    }

I hope my post is clear, since it's the first time I actually post here.

This is my dataset:

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")

credit <- c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent")

student <- c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no")

income <- c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium")

age <- c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) # we change the age from categorical to numeric

794

asked Dec 02 '14 16:12

Codex

2 Answers

Ultimately I find no error in your code as it runs without error. The part I think you are missing is the calculation of the class frequencies and you will get your answer. Quickly running through the different objects you provide I suspect you are looking at buys.

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
freqs <- table(buys)/length(buys)
info(freqs)
[1] 0.940286

As a matter of improving your code, you can simplify this dramatically as you don't need a loop if you are provided a vector of class frequencies.

For example:

# calculate shannon-entropy
-sum(freqs * log2(freqs))
[1] 0.940286

As a side note, the function entropy.empirical is in the entropy package where you set the units to log2 allowing some more flexibility. Example:

entropy.empirical(freqs, unit="log2")
[1] 0.940286

167

answered Sep 20 '22 12:09

cdeterman

There is an another way similar to above answer but using a different function.

> buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")

> probabilities <- prop.table(table(buys))

> probabilities
buys
       no       yes 
0.3571429 0.6428571 

> -sum(probabilities*log2(probabilities))

[1] 0.940286

Also there is a built in function entropy.empirical(probabilities, unit = "log2")

answered Sep 18 '22 12:09

arunppsg

Related questions
                            
                                In R: remove commas from a field AND have the modified field remain part of the dataframe
                            
                                R: Can exists() function be used within mutate() (dplyr package)?
                            
                                R: Check existence of url, problems with httr:GET() and url.exists()
                            
                                dplyr n_distinct with condition
                            
                                How to get the position of elements in a list?
                            
                                Fastest way to read in 100,000 .dat.gz files
                            
                                dplyr arrange by reverse alphabetical order [duplicate]
                            
                                Solving Josephus permutation
                            
                                Adding data labels above geom_col() chart with ggplot2
                            
                                Convert a list of sf objects into one sf
                            
                                Rscript behaves inconsistently on windows with single and double quotes
                            
                                Can Ruby interface with r?
                            
                                creating tree diagram for showing case count using R
                            
                                How do I get discrete factor levels to be treated as continuous?
                            
                                non-joins with data.tables
                            
                                Overlay bar graphs in ggplot2 [duplicate]
                            
                                How to fix the error in R of "no lines available in input"?
                            
                                Cumulative sequence of occurrences of values [duplicate]
                            
                                geom_smooth on a subset of data
                            
                                Quicker way to read single column of CSV file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With