Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a pushable/poppable hash function for stack-like objects?

Tags:

algorithm

hash

I know of rolling hash functions that are similar to a hash on a bounded queue. Is there anything similar for stacks?

My use case is that I am doing a depth first search of possible program traces (with loop unrolling, so these stacks can get biiiiig) and I need to identify branching via these traces. Rather than store a bunch of stacks of depth 1000 I want to hash them so that I can index by int. However, if I have stacks of depth 10000+ this hash is going to be expensive, so I want to keep track of my last hash so that when I push/pop from my stack I can hash/unhash the new/old item respectively.

In particular, I am looking for a hash h(Object, Hash) with an unhash u(Object, Hash) with the property that for object x to be hashed we have:

    u(x, h(x, baseHash)) = baseHash

Additionally, this hash shouldn't be commutative, since order matters.

One thought I had was matrix multiplication over GL(2, F(2^k)), maybe using a Cayley graph? For example, take two invertible matrices A_0, A_1, with inverses B_0 and B_1, in GL(2, F(2^k)), and compute the hash of an object x by first computing some integer hash with bits b31b30...b1b0, and then compute

H(x) = A_b31 . A_b30 . ... . A_b1 . A_b0

This has an inverse

U(x) = B_b0 . B_b1 . ... . B_b30 . B_31.

Thus the h(x, baseHash) = H(x) . baseHash and u(x, baseHash) = U(x) . baseHash, so that

u(x, h(x, base)) = U(x) . H(x) . base = base,

as desired.

This seems like it might be more expensive than is necessary, but for 2x2 matrices it shouldn't be too bad?

like image 939
Ben Kushigian Avatar asked Apr 24 '18 02:04

Ben Kushigian


1 Answers

Most incremental hash functions can be made from two kinds of operations:

1) An invertible diffusion function that mixes up the previous hash. Invertible functions are chosen for this so that they don't loose information. Otherwise the hash would tend towards a few values; and

2) An invertible mixing function to mix new data into the hash. Invertible functions are used for this so that every part of the input has equivalent influence over the final hash value.

Since both these things are invertible, it's very easy to undo the last part of an incremental hash and "pop" off the previous value.

For instance, the most common kind of simple hash functions in use are polynomial hash functions. To update a previous hash value with a new input 'x', you calculate:

h' = h*A + x mod M

The multiplication is the diffusion function. In order for this to be invertible, A must have a multiplicative inverse mod M -- commonly either M is chosen to be prime, or M is a power of 2 and A is odd.

Because the multiplicative inverse exists, it's easy to pop off the last value from the hash, as long as you still have access to it:

h = (h' - x)*(1/A) mod M

You can use the extended Euclidean algorithm to find the inverse of A: https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm

Most other common non-cryptographic hashes, like CRCs, FNV, murmurHash, etc. are similarly easy to pop values off.

Some of these hashes have a final diffusion step after the incremental work, but that step is pretty much always invertible as well, to ensure that the hash can take on any value, so you can undo it to get back to the incremental part.

Diffusion operations are often made from sequences of primitive invertible operations. To undo them you would undo each operation in reverse order. Some of the common types you'll see are:

  • cyclic shifts
  • invertible multiplication (as above)
  • x = x XOR (x >> shift)
  • Feistel rounds (see https://simple.wikipedia.org/wiki/Feistel_cipher)

mixing operations are usually + or XOR.

like image 160
Matt Timmermans Avatar answered Nov 15 '22 06:11

Matt Timmermans