Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash Function with Order Preserving

Is there any hash function with uniq hash code (like MD5) with order preserving?

NOTE: i don't care about security, i need it for sorting, i have lot of chunks with (~1MB size) and i want to sort them, of course i can use index sort but i want to reduce time of compare

Theoreticaly: if i have 1'000'000 chunks with 1MB size (1'048'576 byte) and all of them have difference in last 10 bytes then time of compare of one chunk to other will be O(n-10) and if i will use QuictSort (which make ~(nlog2(n)) compares) then total time of compare will be nlog2(n)*(k-10) (where k is chunk size) 1'000'000 * 20 * (1'048'576 - 10)

that's why i want to generate order preserved hash codes with fixed size (for example 16 bytes) once then sort chunks and save result (for example: in file)

like image 702
Simon Avatar asked Jan 20 '15 11:01

Simon


2 Answers

According to NIST (I'm no expert) a Pearson hash can be order-preserving. The hash uses an auxiliary table. Such a table can (in theory) be constructed such that the resulting hash is order preserving.

It doesn't meet your full requirements though, because it doesn't reduce the size as you would like. I'm posting this in case other people are looking for a solution.

Some pointers:

  • The NIST page: http://xlinux.nist.gov/dads/HTML/pearsonshash.html
  • Wikipedia: http://en.wikipedia.org/wiki/Pearson_hashing
  • The original Pearson Hash paper: http://cs.mwsu.edu/~griffin/courses/2133/downloads/Spring11/p677-pearson.pdf
like image 180
L. Blanc Avatar answered Oct 17 '22 18:10

L. Blanc


In general case, such a function is impossible unless the size of the hash is at least the size of the object.

The argument is trivial: if there are N objects but M < N hash values, by pigeonhole principle, two different objects are mapped to one hash value, and so their order is not preserved.

If however we have additional properties of the objects guaranteed or the requirements relaxed, a custom or probabilistic solution may become possible.

like image 24
Gassa Avatar answered Oct 17 '22 18:10

Gassa