Is there a standard way to do this?
Googling -- "approximate entropy" bits -- uncovers multiple academic papers but I'd like to just find a chunk of pseudocode defining the approximate entropy for a given bit string of arbitrary length.
(In case this is easier said than done and it depends on the application, my application involves 16,320 bits of encrypted data (cyphertext). But encrypted as a puzzle and not meant to be impossible to crack. I thought I'd first check the entropy but couldn't easily find a good definition of such. So it seemed like a question that ought to be on StackOverflow! Ideas for where to begin with de-cyphering 16k random-seeming bits are also welcome...)
See also this related question:
What is the computer science definition of entropy?
It is a measure of the information content of the string, and can be interpreted as the number of bits required to encode each character of the string given perfect compression. The entropy is maximal when each character is equally likely.
To compute Entropy the frequency of occurrence of each character must be found out. The probability of occurrence of each character can therefore be found out by dividing each character frequency value by the length of the string message.
It is calculated by knowing character set (lower alphabets, upper alphabets, numbers, symbols, etc.) used and the length of the created password. It is expressed in terms of bits of entropy per character.
Entropy is not a property of the string you got, but of the strings you could have obtained instead. In other words, it qualifies the process by which the string was generated.
In the simple case, you get one string among a set of N possible strings, where each string has the same probability of being chosen than every other, i.e. 1/N. In the situation, the string is said to have an entropy of N. The entropy is often expressed in bits, which is a logarithmic scale: an entropy of "n bits" is an entropy equal to 2n.
For instance: I like to generate my passwords as two lowercase letters, then two digits, then two lowercase letters, and finally two digits (e.g. va85mw24
). Letters and digits are chosen randomly, uniformly, and independently of each other. This process may produce 26*26*10*10*26*26*10*10 = 4569760000 distinct passwords, and all these passwords have equal chances to be selected. The entropy of such a password is then 4569760000, which means about 32.1 bits.
Shannon's entropy equation is the standard method of calculation. Here is a simple implementation in Python, shamelessly copied from the Revelation codebase, and thus GPL licensed:
import math def entropy(string): "Calculates the Shannon entropy of a string" # get probability of chars in string prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ] # calculate the entropy entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ]) return entropy def entropy_ideal(length): "Calculates the ideal Shannon entropy of a string with given length" prob = 1.0 / length return -1.0 * length * prob * math.log(prob) / math.log(2.0)
Note that this implementation assumes that your input bit-stream is best represented as bytes. This may or may not be the case for your problem domain. What you really want is your bitstream converted into a string of numbers. Just how you decide on what those numbers are is domain specific. If your numbers really are just one and zeros, then convert your bitstream into an array of ones and zeros. The conversion method you choose will affect the results you get, however.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With