Is there any hash function with uniq hash code (like MD5) with order preserving?
NOTE: i don't care about security, i need it for sorting, i have lot of chunks with (~1MB size) and i want to sort them, of course i can use index sort but i want to reduce time of compare
Theoreticaly: if i have 1'000'000 chunks with 1MB size (1'048'576 byte) and all of them have difference in last 10 bytes then time of compare of one chunk to other will be O(n-10) and if i will use QuictSort (which make ~(nlog2(n)) compares) then total time of compare will be nlog2(n)*(k-10) (where k is chunk size) 1'000'000 * 20 * (1'048'576 - 10)
that's why i want to generate order preserved hash codes with fixed size (for example 16 bytes) once then sort chunks and save result (for example: in file)
According to NIST (I'm no expert) a Pearson hash can be order-preserving. The hash uses an auxiliary table. Such a table can (in theory) be constructed such that the resulting hash is order preserving.
It doesn't meet your full requirements though, because it doesn't reduce the size as you would like. I'm posting this in case other people are looking for a solution.
Some pointers:
In general case, such a function is impossible unless the size of the hash is at least the size of the object.
The argument is trivial: if there are N objects but M < N hash values, by pigeonhole principle, two different objects are mapped to one hash value, and so their order is not preserved.
If however we have additional properties of the objects guaranteed or the requirements relaxed, a custom or probabilistic solution may become possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With