What is the default hash code that Mathematica uses?

Tags:

hash

The online documentation says

Hash[expr] 
  gives an integer hash code for the expression expr.
Hash[expr,"type"]
  gives an integer hash code of the specified type for expr.

It also gives "possible hash code types":

"Adler32" Adler 32-bit cyclic redundancy check
"CRC32" 32-bit cyclic redundancy check
"MD2" 128-bit MD2 code
"MD5" 128-bit MD5 code
"SHA" 160-bit SHA-1 code
"SHA256" 256-bit SHA code
"SHA384" 384-bit SHA code
"SHA512" 512-bit SHA code

Yet none of these correspond to the default returned by Hash[expr].

So my questions are:

What method does the default Hash use?
Are there any other hash codes built in?

812

asked Oct 28 '10 03:10

2 Answers

The default hash algorithm is, more-or-less, a basic 32-bit hash function applied to the underlying expression representation, but the exact code is a proprietary component of the Mathematica kernel. It's subject to (and has) change between Mathematica versions, and lacks a number of desirable cryptographic properties, so I personally recommend you use MD5 or one of the SHA variants for any serious application where security matters. The built-in hash is intended for typical data structure use (e.g. in a hash table).

The named hash algorithms you list from the documentation are the only ones currently available. Are you looking for a different one in particular?

answered Sep 28 '22 17:09

Michael Pilat

I've been doing some reverse engeneering on 32 and 64 bit Windows version of Mathematica 10.4 and that's what I found:

32 BIT

It uses a Fowler–Noll–Vo hash function (FNV-1, with multiplication before) with 16777619 as FNV prime and ‭84696351‬ as offset basis. This function is applied on Murmur3-32 hashed value of the address of expression's data (MMA uses a pointer in order to keep one instance of each data). The address is eventually resolved to the value - for simple machine integers the value is immediate, for others is a bit trickier. The Murmur3-32 implementing function contains in fact an additional parameter (defaulted with 4, special case making behaving as in Wikipedia) which selects how many bits to choose from the expression struct in input. Since a normal expression is internally represented as an array of pointers, one can take the first, the second etc.. by repeatedly adding 4 (bytes = 32 bit) to the base pointer of the expression. So, passing 8 to the function will give the second pointer, 12 the third and so on. Since internal structs (big integers, machine integers, machine reals, big reals and so on) have different member variables (e.g. a machine integer has only a pointer to int, a complex 2 pointers to numbers etc..), for each expression struct there is a "wrapper" that combine its internal members in one single 32-bit hash (basically with FNV-1 rounds). The simplest expression to hash is an integer.

The murmur3_32() function has 1131470165 as seed, n=0 and other params as in Wikipedia.

So we have:

  hash_of_number = 16777619 * (84696351‬ ^ murmur3_32( &number ))

with " ^ " meaning XOR. I really didn't try it - pointers are encoded using WINAPI EncodePointer(), so they can't be exploited at runtime. (May be worth running in Linux under Wine with a modified version of EncodePonter?)

64 BIT

It uses a FNV-1 64 bit hash function with 0xAF63BD4C8601B7DF as offset basis and 0x100000001B3 as FNV prime, along with a SIP64-24 hash (here's the reference code) with the first 64 bit of 0x0AE3F68FE7126BBF76F98EF7F39DE1521 as k0 and the last 64 bit as k1. The function is applied to the base pointer of the expression and resolved internally. As in 32-bit's murmur3, there is an additional parameter (defaulted to 8) to select how many pointers to choose from the input expression struct. For each expression type there is a wrapper to condensate struct members into a single hash by means of FNV-1 64 bit rounds.

For a machine integer, we have:

    hash_number_64bit = 0x100000001B3 * (0xAF63BD4C8601B7DF ^ SIP64_24( &number ))

Again, I didn't really try it. Could anyone try?

Not for the faint-hearted

If you take a look at their notes on internal implementation, they say that "Each expression contains a special form of hash code that is used both in pattern matching and evaluation."

The hash code they're referring to is the one generated by these functions - at some point in the normal expression wrapper function there's an assignment that puts the computed hash inside the expression struct itself.

It would certainly be cool to understand HOW they can make use of these hashes for pattern matching purpose. So I had a try running through the bigInteger wrapper to see what happens - that's the simplest compound expression. It begins checking something that returns 1 - dunno what. So it executes

    var1 = 16777619 * (67918732 ^ hashMachineInteger(1));

with hashMachineInteger() is what we said before - including values.

Then it reads the length in bytes of the bigInt from the struct (bignum_length) and runs

    result = 16777619 * (v10 ^ murmur3_32(v6, 4 * v4));

Note that murmur3_32() is called if 4 * bignum_length is greater than 8 (may be related to the max value of machine integers $MaxMachineNumber 2^32^32 and by converse to what a bigInt is supposed to be).

So, the final code is

    if (bignum_length > 8){

    result = 16777619 * (16777619 * (67918732 ^ ( 16777619 * (84696351‬ ^ murmur3_32( 1, 4 )))) ^ murmur3_32( &bignum, 4 * bignum_length ));
    }

I've made some hypoteses on the properties of this construction. The presence of many XORs and the fact that 16777619 + 67918732 = 84696351‬ may make one think that some sort of cyclic structure is exploited to check patterns - i.e. subtracting the offset and dividing by the prime, or something like that. The software Cassandra uses the Murmur hash algorithm for token generation - see these images for what I mean with "cyclic structure". Maybe various primes are used for each expression - must still check.

Hope it helps

answered Sep 28 '22 15:09

JavierG

Related questions
                            
                                Data structure for finding nearby keys with similar bitvalues
                            
                                How do I access a value of a nested Perl hash?
                            
                                Checksumming: CRC or hash?
                            
                                SQL way to get the MD5 or SHA1 of an entire row
                            
                                Hashing a set of integers in an order independent way
                            
                                Which hashing algorithms are available in Android?
                            
                                Best Hash function for detecting data changes?
                            
                                Difference between encryption and hashing
                            
                                Handling a hash as a function argument
                            
                                Calculating a hash code for a large file in parallel
                            
                                How to set up form for a hash in Rails?
                            
                                What kind of password hash/encryption is this?
                            
                                Golang/App Engine - securely hashing a user's password
                            
                                What is the fastest way to map group names of numpy array to indices?
                            
                                Are there circumstances where a hash algorithm can be guaranteed unique?
                            
                                Using XSL to make a hash of XML file
                            
                                Optimal data structure for a special dictionary
                            
                                Converting md5 password hashes to PHP 5.5 password_hash()
                            
                                R fast single item lookup from list vs data.table vs hash
                            
                                Hashing function for four unsigned integers (C++)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With