This is a two part question:
Part 1
First, dealing with calculating the entropy of a password in PHP. I have been unable to find any code examples that are empirically sound and would really like some help in finding the 'right' way to calculate a final number. A lot of folks on the net have their own home-baked weighting algorithm, but I am really looking for the scientific answer to the equation.
I will be using the password entropy as just one part of a larger security system and as a way to analyze our overall data security based on information accessible if a user's password is compromised and how easily a password may be broken by brute force.
Part 2
The second part of this question is: how useful will this number really be? My end goal is to generate a 'score' for each password in the system that we can use to monitor our overall system security as a dynamic entity. I will probably have to work in another algorithm or two for dictionary attacks, l33t replacement passwords, etc--but I do feel that entropy will play an important role in such an 'overall' system rating. I do welcome suggestions for other approaches though.
What I Know
I have seen some mention of logarithmic equations to calculate said entropy, but I have yet to see a good example that isn't actually written as a mathematical equation. I could really use a code example (even if not strictly in PHP) to get me going.
Extension
In making a comment I realized that I can better explain the usefulness of this calculation. When I am working on legacy systems where users have extremely weak passwords I have to have some concrete evidence of that weakness before I can make a case for forcing all users to change their passwords to a new (enforced) strong password. By storing a password strength score for each user account in the system I can build several different metrics to show overall system weakness and make a case for stronger passwords.
TIA
Password entropy is based on the character set used (which is expansible by using lowercase, uppercase, numbers as well as symbols) as well as password length. Password entropy predicts how difficult a given password would be to crack through guessing, brute force cracking, dictionary attacks or other common methods.
Using the properties of logarithms, we can rewrite the above formula as: E = L * log2(R) . That is, we can compute the password entropy by first finding the entropy of one character in the set of R characters, which is equal to log2R , and then multiplying it by the number of characters in the password, i.e., by L .
It is calculated by knowing character set (lower alphabets, upper alphabets, numbers, symbols, etc.) used and the length of the created password. It is expressed in terms of bits of entropy per character.
Entropy of a string has a formal definition specified here: http://en.wikipedia.org/wiki/Entropy_(information_theory)
How useful that value is going to be? It depends. Here's a method (in Java) to calculate entropy I made for an assignment:
public static double entropy() {
double h = 0, p;
for (int i = 0; i < count.size(); i++){
p = count.get(i)/(totalChars*1.0);
h -= p*Math.log(p)/Math.log(2);
}
return h;
}
count
is a Map where (key, value) corresponds to (char, countForChar)
. This obviously means you have to process the string before you call this method.
EDIT 2: Here's the same method, rewritten in PHP
function entropy($string) {
$h=0;
$size = strlen($string);
foreach (count_chars($string, 1) as $v) {
$p = $v/$size;
$h -= $p*log($p)/log(2);
}
return $h;
}
EDIT 3: There's a lot more to password strength than entropy. Entropy is about uncertainty; which doesn't necessarily translate to more security. For example:
Entropy of "akj@!0aj"
is 2.5, while the entropy of "password"
is 2.75
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With