Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP best way to MD5 multi-dimensional array?

People also ask

How can I get multidimensional array in PHP?

Accessing multidimensional array elements: There are mainly two ways to access multidimensional array elements in PHP. Elements can be accessed using dimensions as array_name['first dimension']['second dimension']. Elements can be accessed using for loop. Elements can be accessed using for each loop.

How do I flatten a multidimensional array in PHP?

Use the array_walk_recursive to Flatten a Multidimensional Array in PHP. The built-in function array_walk_recursive can be used with a closure function to flatten a multidimensional array in PHP.

What is multidimensional associative array in PHP?

PHP Multidimensional array is used to store an array in contrast to constant values. Associative array stores the data in the form of key and value pairs where the key can be an integer or string. Multidimensional associative array is often used to store data in group relation.


(Copy-n-paste-able function at the bottom)

As mentioned prior, the following will work.

md5(serialize($array));

However, it's worth noting that (ironically) json_encode performs noticeably faster:

md5(json_encode($array));

In fact, the speed increase is two-fold here as (1) json_encode alone performs faster than serialize, and (2) json_encode produces a smaller string and therefore less for md5 to handle.

Edit: Here is evidence to support this claim:

<?php //this is the array I'm using -- it's multidimensional.
$array = unserialize('a:6:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}i:5;a:5:{i:0;a:0:{}i:1;a:4:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}i:3;a:6:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}i:5;a:5:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}}}');

//The serialize test
$b4_s = microtime(1);
for ($i=0;$i<10000;$i++) {
    $serial = md5(serialize($array));
}
echo 'serialize() w/ md5() took: '.($sTime = microtime(1)-$b4_s).' sec<br/>';

//The json test
$b4_j = microtime(1);
for ($i=0;$i<10000;$i++) {
    $serial = md5(json_encode($array));
}
echo 'json_encode() w/ md5() took: '.($jTime = microtime(1)-$b4_j).' sec<br/><br/>';
echo 'json_encode is <strong>'.( round(($sTime/$jTime)*100,1) ).'%</strong> faster with a difference of <strong>'.($sTime-$jTime).' seconds</strong>';

JSON_ENCODE is consistently over 250% (2.5x) faster (often over 300%) -- this is not a trivial difference. You may see the results of the test with this live script here:

  • http://nathanbrauer.com/playground/serialize-vs-json.php
  • http://nathanbrauer.com/playground/plain-text/serialize-vs-json.php

Now, one thing to note is array(1,2,3) will produce a different MD5 as array(3,2,1). If this is NOT what you want. Try the following code:

//Optionally make a copy of the array (if you want to preserve the original order)
$original = $array;

array_multisort($array);
$hash = md5(json_encode($array));

Edit: There's been some question as to whether reversing the order would produce the same results. So, I've done that (correctly) here:

  • http://nathanbrauer.com/playground/json-vs-serialize.php
  • http://nathanbrauer.com/playground/plain-text/json-vs-serialize.php

As you can see, the results are exactly the same. Here's the (corrected) test originally created by someone related to Drupal:

  • http://nathanjbrauer.com/playground/drupal-calculation.php
  • http://nathanjbrauer.com/playground/plain-text/drupal-calculation.php

And for good measure, here's a function/method you can copy and paste (tested in 5.3.3-1ubuntu9.5):

function array_md5(Array $array) {
    //since we're inside a function (which uses a copied array, not 
    //a referenced array), you shouldn't need to copy the array
    array_multisort($array);
    return md5(json_encode($array));
}

md5(serialize($array));

I'm joining a very crowded party by answering, but there is an important consideration that none of the extant answers address. The value of json_encode() and serialize() both depend upon the order of elements in the array!

Here are the results of not sorting and sorting the arrays, on two arrays with identical values but added in a different order (code at bottom of post):

    serialize()
1c4f1064ab79e4722f41ab5a8141b210
1ad0f2c7e690c8e3cd5c34f7c9b8573a

    json_encode()
db7178ba34f9271bfca3a05c5dddf502
c9661c0852c2bd0e26ef7951b4ca9e6f

    Sorted serialize()
1c4f1064ab79e4722f41ab5a8141b210
1c4f1064ab79e4722f41ab5a8141b210

    Sorted json_encode()
db7178ba34f9271bfca3a05c5dddf502
db7178ba34f9271bfca3a05c5dddf502

Therefore, the two methods that I would recommend to hash an array would be:

// You will need to write your own deep_ksort(), or see
// my example below

md5(   serialize(deep_ksort($array)) );

md5( json_encode(deep_ksort($array)) );

The choice of json_encode() or serialize() should be determined by testing on the type of data that you are using. By my own testing on purely textual and numerical data, if the code is not running a tight loop thousands of times then the difference is not even worth benchmarking. I personally use json_encode() for that type of data.

Here is the code used to generate the sorting test above:

$a = array();
$a['aa'] = array( 'aaa'=>'AAA', 'bbb'=>'ooo', 'qqq'=>'fff',);
$a['bb'] = array( 'aaa'=>'BBBB', 'iii'=>'dd',);

$b = array();
$b['aa'] = array( 'aaa'=>'AAA', 'qqq'=>'fff', 'bbb'=>'ooo',);
$b['bb'] = array( 'iii'=>'dd', 'aaa'=>'BBBB',);

echo "    serialize()\n";
echo md5(serialize($a))."\n";
echo md5(serialize($b))."\n";

echo "\n    json_encode()\n";
echo md5(json_encode($a))."\n";
echo md5(json_encode($b))."\n";



$a = deep_ksort($a);
$b = deep_ksort($b);

echo "\n    Sorted serialize()\n";
echo md5(serialize($a))."\n";
echo md5(serialize($b))."\n";

echo "\n    Sorted json_encode()\n";
echo md5(json_encode($a))."\n";
echo md5(json_encode($b))."\n";

My quick deep_ksort() implementation, fits this case but check it before using on your own projects:

/*
* Sort an array by keys, and additionall sort its array values by keys
*
* Does not try to sort an object, but does iterate its properties to
* sort arrays in properties
*/
function deep_ksort($input)
{
    if ( !is_object($input) && !is_array($input) ) {
        return $input;
    }

    foreach ( $input as $k=>$v ) {
        if ( is_object($v) || is_array($v) ) {
            $input[$k] = deep_ksort($v);
        }
    }

    if ( is_array($input) ) {
        ksort($input);
    }

    // Do not sort objects

    return $input;
}

Answer is highly depends on data types of array values. For big strings use:

md5(serialize($array));

For short strings and integers use:

md5(json_encode($array));

4 built-in PHP functions can transform array to string: serialize(), json_encode(), var_export(), print_r().

Notice: json_encode() function slows down while processing associative arrays with strings as values. In this case consider to use serialize() function.

Test results for multi-dimensional array with md5-hashes (32 char) in keys and values:

Test name       Repeats         Result          Performance     
serialize       10000           0.761195 sec    +0.00%
print_r         10000           1.669689 sec    -119.35%
json_encode     10000           1.712214 sec    -124.94%
var_export      10000           1.735023 sec    -127.93%

Test result for numeric multi-dimensional array:

Test name       Repeats         Result          Performance     
json_encode     10000           1.040612 sec    +0.00%
var_export      10000           1.753170 sec    -68.47%
serialize       10000           1.947791 sec    -87.18%
print_r         10000           9.084989 sec    -773.04%

Associative array test source. Numeric array test source.


Aside from Brock's excellent answer (+1), any decent hashing library allows you to update the hash in increments, so you should be able to update with each string sequentially, instead having to build up one giant string.

See: hash_update


md5(serialize($array));

Will work, but the hash will change depending on the order of the array (that might not matter though).


Note that serialize and json_encode act differently when it comes to numeric arrays where the keys don't start at 0, or associative arrays. json_encode will store such arrays as an Object, so json_decode returns an Object, where unserialize will return an array with exact the same keys.