Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP - What is a good way to produce a short alphanumeric string from a long md5 hash?

Tags:

php

random

base

This is for the purpose of having a nice short URL which refers to an md5 hash in a database. I would like to convert something like this:

a7d2cd9e0e09bebb6a520af48205ced1

into something like this:

hW9lM5f27

Those both contain about the same amount of information. The method doesn't have to be direct and reversible but that would be nice (more flexible). At the least I would want a randomly generated string with the hex hash as the seed so it is reproducible. I'm sure there are many possible answers, I am curious to see how people would do it in an elegant way.

Oh, this doesn't have to have perfect 1:1 correspondence with the original hash but that would be a bonus (I guess I already implied that with the reversibility criteria). And I would like to avoid collisions if possible.

EDIT I realized my initial calculations were totally wrong (thanks to the people answering here but it took me awhile to clue in) and you can't really reduce the string length very much by throwing in all the lower case and uppercase letters into the mix. So I guess I will want something that doesn't directly convert from hex to base 62.

like image 912
Moss Avatar asked Jul 22 '10 22:07

Moss


People also ask

How to generate alphanumeric in PHP?

There are many ways to generate a random, unique, alphanumeric string in PHP which are given below: Using str_shuffle() Function: The str_shuffle() function is an inbuilt function in PHP and is used to randomly shuffle all the characters of a string passed to the function as a parameter.

What is MD5 hash in PHP?

The md5() function uses the RSA Data Security, Inc. MD5 Message-Digest Algorithm. From RFC 1321 - The MD5 Message-Digest Algorithm: "The MD5 message-digest algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input.

How do you generate random unique alphanumeric strings in laravel?

If you need to generate unique random string then you can use str_random() helper of Laravel. It is very simple and you can use easily. you can easily generate random string in laravel 6, laravel 7, laravel 8 and laravel 9 version using str helper.


1 Answers

Here's a little function for consideration:

/** Return 22-char compressed version of 32-char hex string (eg from PHP md5). */
function compress_md5($md5_hash_str) {
    // (we start with 32-char $md5_hash_str eg "a7d2cd9e0e09bebb6a520af48205ced1")
    $md5_bin_str = "";
    foreach (str_split($md5_hash_str, 2) as $byte_str) { // ("a7", "d2", ...)
        $md5_bin_str .= chr(hexdec($byte_str));
    }
    // ($md5_bin_str is now a 16-byte string equivalent to $md5_hash_str)
    $md5_b64_str = base64_encode($md5_bin_str);
    // (now it's a 24-char string version of $md5_hash_str eg "VUDNng4JvrtqUgr0QwXOIg==")
    $md5_b64_str = substr($md5_b64_str, 0, 22);
    // (but we know the last two chars will be ==, so drop them eg "VUDNng4JvrtqUgr0QwXOIg")
    $url_safe_str = str_replace(array("+", "/"), array("-", "_"), $md5_b64_str);
    // (Base64 includes two non-URL safe chars, so we replace them with safe ones)
    return $url_safe_str;
}

Basically you have 16-bytes of data in the MD5 hash string. It's 32 chars long because each byte is encoded as 2 hex digits (i.e. 00-FF). So we break them up into bytes and build up a 16-byte string of it. But because this is no longer human-readable or valid ASCII, we base-64 encode it back to readable chars. But since base-64 results in ~4/3 expansion (we only output 6 bits per 8 bits of input, thus requiring 32 bits to encode 24 bits), the 16-bytes becomes 22 bytes. But because base-64 encoding typically pads to lengths multiples of 4, we can take only the first 22 chars of the 24 character output (the last 2 of which are padding). Then we replace non-URL-safe characters used by base-64 encoding with URL-safe equivalents.

This is fully reversible, but that is left as an exercise to the reader.

I think this is the best you can do, unless you don't care about human-readable/ASCII, in which case you can just use $md5_bin_str directly.

And also you can use a prefix or other subset of the result from this function if you don't need to preserve all the bits. Throwing out data is obviously the simplest way to shorten things! (But then it's not reversible)

P.S. for your input of "a7d2cd9e0e09bebb6a520af48205ced1" (32 chars), this function will return "VUDNng4JvrtqUgr0QwXO0Q" (22 chars).

like image 55
dkamins Avatar answered Oct 10 '22 23:10

dkamins