Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shannon's entropy formula. Help my confusion

my understanding of the entropy formula is that it's used to compute the minimum number of bits required to represent some data. It's usually worded differently when defined, but the previous understanding is what I relied on until now.

Here's my problem. Suppose I have a sequence of 100 '1' followed by 100 '0' = 200 bits. The alphabet is {0,1}, base of entropy is 2. Probability of symbol "0" is 0.5 and "1" is 0.5. So the entropy is 1 or 1 bit to represent 1 bit.

However you can run-length encode it with something like 100 / 1 / 100 / 0 where it's number of bits to output followed by the bit. It seems like I have a representation smaller than the data. Especially if you increase the 100 to much larger number.

I'm using: http://en.wikipedia.org/wiki/Information_entropy as reference at the moment. Where did I go wrong? Is it the probability assigned to symbols? I don't think it's wrong. Or did I get the connection between compression and entropy wrong? Anything else?

Thanks.

Edit

Following some of the answers my followup are: would you apply the entropy formula to a particular instance of a message to try to find out its information content? Would it be valid to take the message "aaab" and say the entropy is ~0.811. If yes then what's the entropy of 1...10....0 where 1s and 0s are repeated n times using the entropy formula. Is the answer 1?

Yes I understand that you are creating a random variable of your input symbols and guessing at the probability mass function based on your message. What I'm trying to confirm is the entropy formula does not take into account the position of the symbols in the message.

like image 907
Budric Avatar asked Mar 16 '09 16:03

Budric


People also ask

Why is Shannon entropy important?

Shannon's entropy quantifies the amount of information in a variable, thus providing the foundation for a theory around the notion of information. Storage and transmission of information can intuitively be expected to be tied to the amount of information involved.

What Shannon entropy tells us?

The Shannon entropy can measure the uncertainty of a random process. Rolling element machinery without failure tends to generate a more random signal, and the machine with failure usually tends to have a more deterministic signal; i.e., the Shannon entropy will be different.

Why is entropy important in information theory?

Information provides a way to quantify the amount of surprise for an event measured in bits. Entropy provides a measure of the average amount of information needed to represent an event drawn from a probability distribution for a random variable.


3 Answers

Or did I get the connection between compression and entropy wrong?

You're pretty close, but this last question is where the mistake was. If you're able to compress something into a form that was smaller than its original representation, it means that the original representation had at least some redundancy. Each bit in the message really wasn't conveying 1 bit of information.

Because redundant data does not contribute to the information content of a message, it also does not increase its entropy. Imagine, for example, a "random bit generator" that only returns the value "0". This conveys no information at all! (Actually, it conveys an undefined amount of information, because any binary message consisting of only one kind of symbol requires a division by zero in the entropy formula.)

By contrast, had you simulated a large number of random coin flips, it would be very hard to reduce the size of this message by much. Each bit would be contributing close to 1 bit of entropy.

When you compress data, you extract that redundancy. In exchange, you pay a one-time entropy price by having to devise a scheme that knows how to compress and decompress this data; that itself takes some information.

However you can run-length encode it with something like 100 / 1 / 100 / 0 where it's number of bits to output followed by the bit. It seems like I have a representation smaller than the data. Especially if you increase the 100 to much larger number.

To summarize, the fact that you could devise a scheme to make the encoding of the data smaller than the original data tells you something important. Namely, it says that your original data contained very little information.


Further reading

For a more thorough treatment of this, including exactly how you'd calculate the entropy for any arbitrary sequence of digits with a few examples, check out this short whitepaper.

like image 127
John Feminella Avatar answered Sep 21 '22 17:09

John Feminella


Have a look at Kolmogorov complexity

The minimum number of bits into which a string can be compressed without losing information. This is defined with respect to a fixed, but universal decompression scheme, given by a universal Turing machine.

And in your particular case, don't restrict yourself to alphabet {0,1}. For your example use {0...0, 1...1} (hundred of 0's and hundred of 1's)

like image 23
Anonymous Avatar answered Sep 21 '22 17:09

Anonymous


Your encoding works in this example, but it is possible to conceive an equally valid case: 010101010101... which would be encoded as 1 / 0 / 1 / 1 / ...

Entropy is measured across all possible messages that can be constructed in the given alphabet, and not just pathological examples!

like image 44
butterchicken Avatar answered Sep 20 '22 17:09

butterchicken