Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is Maximum Entropy?

Can someone give me a clear and simple definition of Maximum entropy classification? It would be very helpful if someone can provide a clear analogy, as I am struggling to understand.

like image 202
Mr_Shoryuken Avatar asked May 14 '16 15:05

Mr_Shoryuken


1 Answers

"Maximum Entropy" is synonymous with "Least Informative". You wouldn't want a classifier that was least informative. It is in reference to how the priors are established. Frankly, "Maximum Entropy Classification" is an example of using buzz words.

For an example of an uninformative prior, consider given a six-sided object. The probability that any given face will appear if the object is tossed is 1/6. This would be your starting prior. It's the least informative. You really wouldn't want to start with anything else or you will bias later calculations. Of course, if you have knowledge that one side will appear more often you should incorporate that into your priors.

The Bayes formula is P(H|E) = P(E|H)P(H)/P(D) where P(H) is the prior for the hypothesis and P(D) is the sum of all possible numerators.

For text classification where a missing word is to be inserted, E is some given document and H is the given word. IOW, the hypothesis is that H is the word which should be selected and P(H) is the weight given to the word.

Maximum Entropy Text classification means: start with least informative weights (priors) and optimize to find weights that maximize the likelihood of the data, the P(D). Essentially, it's the EM algorithm.

A simple Naive Bayes classifier would assume the prior weights would be proportional to the number of times the word appears in the document. However,this ignore correlations between words.

The so-called MaxEnt classifier, takes the correlations into account.

I can't think of a simple example to illustrate this but I can think of some correlations. For example, "the missing" in English should give higher weights to nouns but a Naive Bayes classifier might give equal weight to a verb if its relative frequency were the same as a given noun. A MaxEnt classifier considering missing would give more weight to nouns because they would be more likely in context.

like image 84
DAV Avatar answered Oct 17 '22 00:10

DAV