Summary: Is there a way to do that? Here's what I mean: suppose I have an unsigned int number. Then I multiply it several times(and there's overflow, which is expected). Then is it possible to "revert" the original value back?
In details:
It's all about Rabin-Karp rolling hash. What I need to do is: I have the hash of a long string - for example: "abcd". Then I have the hash for a shorter substring - for example "cd". How to calculate the "ab" hash with O(1), using the two given hashes?
What I have now as an algorithm:
p ^ len( "cd" )
, where p
is the base (prime number). So this is:
a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0
- abcd
c * p ^ 1 + d * p ^ 0
- cd
ab gets:
( ( a * p ^ 3 + b * p ^ 2 + c * p ^ 1 + d * p ^ 0 ) - ( c * p ^ 1 + d * p ^ 0 ) ) / ( p ^ 2 ) = a * p ^ 1 + b * p ^ 0
And this works, if I don't have an overflow (if p
is small number). But if it's not - it's not working.
Is there any trick or something?
P.S. The c++
tag is because of the number's overflow, as it is specific (and different from python, scheme or sth)
Don't know about the overflow part, but there is a way of getting back the original value.
The Chinese Remainder Theorem help a great deal. Let's call h = abcd - cd
. G is the value, h
, without overflows, G = h + k*2^32
, assuming the overflow simply does %2^32
. And thus ab = G / p^2
.
G = h (mod 2^32)
G = 0 (mod p^2)
If p^2 and 2^32 are coprime. This page on Chinese Remainder Theorem, gives us
G = h * b * p^2 (mod 2^32 * p^2)
Where b
is modular multiplicative inverse of p^2 modulo 2^32, b * p^2 = 1 (mod 2^32)
. After you calculate G
, simply divide by p^2
to find ab
.
I hope I didn't make any mistakes...
Extended Euclidean algorithm is a good solution for this, but it's too complicated and hard to implement. There's a better one.
And there's another way to do this (thanks to e friend of mine (: )
There's a nice article in wikipedia - modular multiplicative inverse using Euler's theorem in the case, when m
and a
are coprime:
where φ(m)
is Euler's totient function.
In my case, the m
(modulo) is the size of the hash type - 2^32
, 2^64
, etc. (64bit in my case).
Well, this means, that we should only find the value of φ(m)
. But think about that - m == 2 ^ 64
so, that gives us the guarantee that m
will be coprime with all odd numbers and will NOT be coprime any even number. So, what we need to do is to get the number of all values and divide them by 2.
Also, we know that m
will be unsigned, as otherwise we will have some issues. Than this gives us the chance to do this:
hash_t x = -1;
x /= 2;
hash_t a_reverse = fast_pow( a, x );
Well, about 64bit numbers, x
is really big number ( 19 digits: 9 223 372 036 854 775 807
), but fast_pow
is really fast and we could cache the reverse number, in case that we need for more than one query.
fast_pow
is a well-known algorithm:
hash_t fast_pow( hash_t source, hash_t pow )
{
if( 0 == pow )
{
return 1;
}
if( 0 != pow % 2 )
{
return source * fast_pow( source, pow - 1 );
}
else
{
return fast_pow( source * source, pow / 2 );
}
}
Addition: for example:
hash_t base = 2305843009213693951; // 9th mersenne prime
hash_t x = 1234567890987654321;
x *= fast_pow( base, 123456789 ); // x * ( base ^ 123456789 )
hash_t y = -1;
y /= 2;
hash_t base_reverse = fast_pow( base, y );
x *= fast_pow( base_reverse, 123456789 ); // x * ( base_reverse ^ 123456789 )
assert( x == 1234567890987654321 ) ;
works perfect and very fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With