Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I compute the approximate entropy of a bit string?

Is there a standard way to do this?

Googling -- "approximate entropy" bits -- uncovers multiple academic papers but I'd like to just find a chunk of pseudocode defining the approximate entropy for a given bit string of arbitrary length.

(In case this is easier said than done and it depends on the application, my application involves 16,320 bits of encrypted data (cyphertext). But encrypted as a puzzle and not meant to be impossible to crack. I thought I'd first check the entropy but couldn't easily find a good definition of such. So it seemed like a question that ought to be on StackOverflow! Ideas for where to begin with de-cyphering 16k random-seeming bits are also welcome...)

See also this related question:
What is the computer science definition of entropy?

like image 882
dreeves Avatar asked Jun 05 '10 04:06

dreeves


People also ask

What is the entropy of a string?

It is a measure of the information content of the string, and can be interpreted as the number of bits required to encode each character of the string given perfect compression. The entropy is maximal when each character is equally likely.

How do you calculate entropy of text?

To compute Entropy the frequency of occurrence of each character must be found out. The probability of occurrence of each character can therefore be found out by dividing each character frequency value by the length of the string message.

How do you calculate entropy in cryptography?

It is calculated by knowing character set (lower alphabets, upper alphabets, numbers, symbols, etc.) used and the length of the created password. It is expressed in terms of bits of entropy per character.


2 Answers

Entropy is not a property of the string you got, but of the strings you could have obtained instead. In other words, it qualifies the process by which the string was generated.

In the simple case, you get one string among a set of N possible strings, where each string has the same probability of being chosen than every other, i.e. 1/N. In the situation, the string is said to have an entropy of N. The entropy is often expressed in bits, which is a logarithmic scale: an entropy of "n bits" is an entropy equal to 2n.

For instance: I like to generate my passwords as two lowercase letters, then two digits, then two lowercase letters, and finally two digits (e.g. va85mw24). Letters and digits are chosen randomly, uniformly, and independently of each other. This process may produce 26*26*10*10*26*26*10*10 = 4569760000 distinct passwords, and all these passwords have equal chances to be selected. The entropy of such a password is then 4569760000, which means about 32.1 bits.

like image 66
Thomas Pornin Avatar answered Sep 18 '22 13:09

Thomas Pornin


Shannon's entropy equation is the standard method of calculation. Here is a simple implementation in Python, shamelessly copied from the Revelation codebase, and thus GPL licensed:

import math   def entropy(string):     "Calculates the Shannon entropy of a string"      # get probability of chars in string     prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ]      # calculate the entropy     entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ])      return entropy   def entropy_ideal(length):     "Calculates the ideal Shannon entropy of a string with given length"      prob = 1.0 / length      return -1.0 * length * prob * math.log(prob) / math.log(2.0)  

Note that this implementation assumes that your input bit-stream is best represented as bytes. This may or may not be the case for your problem domain. What you really want is your bitstream converted into a string of numbers. Just how you decide on what those numbers are is domain specific. If your numbers really are just one and zeros, then convert your bitstream into an array of ones and zeros. The conversion method you choose will affect the results you get, however.

like image 24
fmark Avatar answered Sep 16 '22 13:09

fmark