Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does crypt/blowfish generate the same hash with two different salts?

This question has to do with PHP's implementation of crypt(). For this question, the first 7 characters of the salt are not counted, so a salt '$2a$07$a' would be said to have a length of 1, as it is only 1 character of salt and seven characters of meta-data.

When using salt strings longer than 22 characters, there is no change in the hash generated (i.e., truncation), and when using strings shorter than 21 characters the salt will automatically be padded (with '$' characters, apparently); this is fairly straightforward. However, if given a salt 20 characters and a salt 21 characters, where the two are identical except for the final character of the 21-length salt, both hashed strings will be identical. A salt 22 characters long, which is identical to the 21 length salt except for the final character, the hash will be different again.

Example In Code:

$foo = 'bar';
$salt_xx = '$2a$07$';
$salt_19 = $salt_xx . 'b1b2ee48991281a439d';
$salt_20 = $salt_19 . 'a';
$salt_21 = $salt_20 . '2';
$salt_22 = $salt_21 . 'b';

var_dump(
    crypt($foo, $salt_19), 
    crypt($foo, $salt_20), 
    crypt($foo, $salt_21), 
    crypt($foo, $salt_22)
);

Will produce:

string(60) "$2a$07$b1b2ee48991281a439d$$.dEUdhUoQXVqUieLTCp0cFVolhFcbuNi"
string(60) "$2a$07$b1b2ee48991281a439da$.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2.UxGYN739wLkV5PGoR1XA4EvNVPjwylG"
string(60) "$2a$07$b1b2ee48991281a439da2O4AH0.y/AsOuzMpI.f4sBs8E2hQjPUQq"

Why is this?

EDIT:

Some users are noting that there is a difference in the overall string, which is true. In salt_20, offset (28, 4) is da$., while in salt_21, offset (28, 4) is da2.; however, it is important to note that the string generated includes the hash, the salt, as well as instructions to generate the salt (i.e. $2a$07$); the part in which the difference occurs is, in fact, still the salt. The actual hash is unchanged as UxGYN739wLkV5PGoR1XA4EvNVPjwylG.

Thus, this is in fact not a difference in the hash produced, but a difference in the salt used to store the hash, which is precisely the problem at hand: two salts are generating the same hash.

Rembmer: the output will be in the following format:

"$2a$##$saltsaltsaltsaltsaltsaHASHhashHASHhashHASHhashHASHhash"
//                            ^ Hash Starts Here, offset 28,32

where ## is the log-base-2 determining the number of iterations the algorithm runs for

Edit 2:

In the comments, it was requested that I post some additional info, as the user could not reproduce my output. Execution of the following code:

var_dump(
    PHP_VERSION, 
    PHP_OS, 
    CRYPT_SALT_LENGTH, 
    CRYPT_STD_DES, 
    CRYPT_EXT_DES, 
    CRYPT_MD5, 
    CRYPT_BLOWFISH
);

Produces the following output:

string(5) "5.3.0"
string(5) "WINNT"
int(60)
int(1)
int(1)
int(1)
int(1)

Hope this helps.

like image 468
Dereleased Avatar asked Feb 08 '10 23:02

Dereleased


2 Answers

After some experimentation, I have come to the conclusion that this is due to the way the salt is treated. The salt is not considered to be literal text, but rather to be a base64 encoded string, such that 22 bytes of salt data would actually represent a 16 byte string (floor(22 * 24 / 32) == 16) of salt. The "Gotcha!" with this implementation, though, is that, like Unix crypt, it uses a "non-standard" base64 alphabet. To be exact, it uses this alphabet:

./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789$

The 65th character, '$', is the padding character.

Now, the crypt() function appears to be capable of taking a salt of any length less than or equal to its maximum, and silently handling any inconsistencies in the base64 by discarding any data that doesn't make up another full byte. The crypt function will fail completely if you pass it characters in the salt that are not part of its base64 alphabet, which just confirms this theory of its operation.

Take an imaginary salt '1234'. This is perfectly base64 consistent in that it represents 24 bits of data, so 3 bytes, and does not carry any data that needs to be discarded. This is a salt whose Len Mod 4 is zero. Append any character to that salt, and it becomes a 5 character salt, and Len Mod 4 is now 1. However, this additional character represents only six bits of data, and therefore cannot be transformed into another full byte, so it is discarded.

Thus, for any two salts A and B, where

   Len A Mod 4 == 0 
&& Len B Mod 4 == 1  // these two lines mean the same thing
&& Len B = Len A + 1 // but are semantically important separately
&& A == substr B, 0, Len A

The actual salt used by crypt() to calculate the hash will, in fact, be identical. As proof, I'm including some example PHP code that can be used to show this. The salt constantly rotates in a seminon-random way (based on a randomish segment of the whirlpool hash of the current time to the microsecond), and the data to be hashed (herein called $seed) is simply the current Unix-Epoch time.

$salt = substr(hash('whirlpool',microtime()),rand(0,105),22);
$seed = time();
for ($i = 0, $j = strlen($salt); $i <= $j; ++$i) {
    printf('%02d = %s%s%c',
        $i,
        crypt($seed,'$2a$07$' . substr($salt, 0, $i)),
        $i%4 == 0 || $i % 4 == 1 ? ' <-' : '',
        0x0A
    );
}

And this produces output similar to the following

00 = $2a$07$$$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
01 = $2a$07$e$$$$$$$$$$$$$$$$$$$$.rBxL4x0LvuUp8rhGfnEKSOevBKB5V2. <-
02 = $2a$07$e8$$$$$$$$$$$$$$$$$$$.WEimjvvOvQ.lGh/V6HFkts7Rq5rpXZG
03 = $2a$07$e89$$$$$$$$$$$$$$$$$$.Ww5p352lsfQCWarRIWWGGbKa074K4/.
04 = $2a$07$e895$$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
05 = $2a$07$e8955$$$$$$$$$$$$$$$$.ZGSPawtL.pOeNI74nhhnHowYrJBrLuW <-
06 = $2a$07$e8955b$$$$$$$$$$$$$$$.2UumGVfyc4SgAZBs5P6IKlUYma7sxqa
07 = $2a$07$e8955be$$$$$$$$$$$$$$.gb6deOAckxHP/WIZOGPZ6/P3oUSQkPm
08 = $2a$07$e8955be6$$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
09 = $2a$07$e8955be61$$$$$$$$$$$$.5gox0YOqQMfF6FBU9weAz5RmcIKZoki <-
10 = $2a$07$e8955be616$$$$$$$$$$$.hWHhdkS9Z3m7/PMKn1Ko7Qf2S7H4ttK
11 = $2a$07$e8955be6162$$$$$$$$$$.meHPOa25CYG2G8JrbC8dPQuWf9yw0Iy
12 = $2a$07$e8955be61624$$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
13 = $2a$07$e8955be616246$$$$$$$$.vcp/UGtAwLJWvtKTndM7w1/30NuYdYa <-
14 = $2a$07$e8955be6162468$$$$$$$.OTzcPMwrtXxx6YHKtaX0mypWvqJK5Ye
15 = $2a$07$e8955be6162468d$$$$$$.pDcOFp68WnHqU8tZJxuf2V0nqUqwc0W
16 = $2a$07$e8955be6162468de$$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
17 = $2a$07$e8955be6162468deb$$$$.YDv5tkOeXkOECJmjl1R8zXVRMlU0rJi <-
18 = $2a$07$e8955be6162468deb0$$$.aNZIHogUlCn8H7W3naR50pzEsQgnakq
19 = $2a$07$e8955be6162468deb0d$$.ytfAwRL.czZr/K3hGPmbgJlheoZUyL2
20 = $2a$07$e8955be6162468deb0da$.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
21 = $2a$07$e8955be6162468deb0da3.0xhS8VgxJOn4skeI02VNI6jI6324EPe <-
22 = $2a$07$e8955be6162468deb0da3ucYVpET7X/5YddEeJxVqqUIxs3COrdym

The conclusion? Twofold. First, it's working as intended, and second, know your own salt or don't roll your own salt.

like image 104
Dereleased Avatar answered Nov 04 '22 19:11

Dereleased


Great answer, and clear explanation. But it seems to me there is either a bug in the implementation or some further explanation of the intent is needed {the comments to the post explain why there is not a bug}. The current php documentation states:

CRYPT_BLOWFISH - Blowfish hashing with a salt as follows: "$2a$", a two digit cost parameter, "$", and 22 base 64 digits from the alphabet "./0-9A-Za-z". Using characters outside of this range in the salt will cause crypt() to return a zero-length string. The two digit cost parameter is the base-2 logarithm of the iteration count for the underlying Blowfish-based hashing algorithmeter and must be in range 04-31, values outside this range will cause crypt() to fail.

This is consistent with what's been stated and demonstrated here. Unfortunately the documentation doesn't describe the return value very usefully:

Returns the hashed string or a string that is shorter than 13 characters and is guaranteed to differ from the salt on failure.

But as shown in the reply by Dereleased, if the input salt string is valid, the output consists of the input salt padded out to a fixed length with '$' characters, with the 32-character computed hash value appended to it. Unfortunately, the salt in the result is padded out to only 21 base64 digits, not 22! This is shown by the last three lines in that reply, where we see one '$' for 20 digits, no '$' for 21, and when there are 22 base64 digits in the salt, the first character of the hash result replaces the 22nd digit of the input salt. The function is still usable, because the complete value it computes is available to the caller as substr(crypt($pw,$salt), 28, 32), and the caller already knows the complete salt value because it passed that string as an argument. But it's very difficult to understand why the return value is designed so that it can only give you 126 bits of the 128-bit salt value. In fact, it's hard to understand why it includes the input salt at all; but omitting 2 bits of it is really unfathomable.

Here's a little snippet showing that the 22nd base64 digit contributes just two more bits to the salt actually used in the computation (there are only 4 distinct hashes produced):

$alphabet = './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
$lim = strlen($alphabet);
$saltprefix = '$2a$04$123456789012345678901'; // 21 base64 digits


for ($i = 0; $i < $lim; ++$i ) {
  if ($i = 16 || $i == 32 || $i == 48) echo "\n";
  $salt = $saltprefix . substr($alphabet, $i, 1);
  $crypt = crypt($password, $salt);
  echo "salt ='$salt'\ncrypt='$crypt'\n";
}

salt ='$2a$04$123456789012345678901.'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901/'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901A'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901B'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901C'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901D'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901E'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901F'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901G'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901H'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901I'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901J'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901K'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901L'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901M'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'
salt ='$2a$04$123456789012345678901N'
crypt='$2a$04$123456789012345678901.YpaB4l25IJ3b3F3H8trjHXj5SC1UbUW'

salt ='$2a$04$123456789012345678901O'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
salty='$2a$04$123456789012345678901P'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
salty='$2a$04$123456789012345678901Q'
crypt='$2a$04$123456789012345678901Ots44xXtSV0f6zMrHerQ2IANdsJ.2ioG'
  ... 13 more pairs of output lines with same hash

salt ='$2a$04$123456789012345678901e'
crypt='$2a$04$123456789012345678901e.1cixwQ2qnBqwFeEcMfNfXApRK0ktqm'
  ... 15 more pairs of output lines with same hash

salt ='$2a$04$123456789012345678901u'
crypt='$2a$04$123456789012345678901u5yLyHIE2JetWU67zG7qvtusQ2KIZhAa'
  ... 15 more pairs of output lines with same hash

The grouping of the identical hash values also shows that the mapping of the alphabet actually used is most likely as written here, rather then in the order shown in the other reply.

Perhaps the interface was designed this way for some kind of compatibility, and perhaps because it has already shipped this way it can't be changed. {the first comment to the post explains why the interface is this way}. But certainly the documentation ought to explain what's going on. Just in case the bug might get fixed some day, perhaps it would be safest to obtain the hash value with:

substr(crypt($pw,$salt), -32)

As a final note, while the explanation of why the hash value repeats when the number of base64 digits specified mod 4 == 1 makes sense in terms of why code might behave that way, it doesn't explain why writing the code that way was a good idea. The code could and arguably should include the bits from a base64 digit that makes up a partial byte when computing the hash, instead of just discarding them. If the code had been written that way, then it seems likely the problem with losing the 22nd digit of the salt in the output would not have appeared, either. {As the comments to the post explain, even though the 22nd digit is overwritten, the digit of the hash that overwrites it will be only one of the four possible values [.Oeu], and these are the only significant values for the 22nd digit. If the 22nd digit is not one of those four values, it will be replaced by the one of those four that produces the same hash.}

In light of the comments, it seems clear there is no bug, just incredibly taciturn documentation :-) Since I'm not a cryptographer, I can't say this with any authority, but it seems to me that it's a weakness of the algorithm that a 21-digit salt apparently can produce all possible hash values, while a 22-digit salt limits the first digit of the hash to only one of four values.

like image 8
sootsnoot Avatar answered Nov 04 '22 21:11

sootsnoot