Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast bi-directional hash of two integers in C

I am writing a Linux kernel module and I need to come up with a hashing function that takes two integers for input. Because the code runs in kernel space, none of the standard libraries are available to me.

Basically, I need a hashing function where:

hash(a, b) = c
hash(b, a) = c

Where acceptable inputs for a and b are unsigned 32-bit integers. The hashing function should return an unsigned 64-bit integer. Collision (i.e. hash(a, b) = c and hash(d, f) = c as well) is not desirable as these values will be used in a binary search tree. The result of the search is a linked list of possible results that is then iterated over where a and b are actually compared. So some collision is acceptable, but the less collisions, the less iterations required, and the faster it will run.

Performance is also of extreme importance, this lookup will be used for every packet received in a system as I am writing a firewall application (the integers are actually packet source and destination addresses). This function is used to lookup existing network sessions.

Thank you for your time.

like image 251
Blake Beaupain Avatar asked Aug 02 '12 22:08

Blake Beaupain


5 Answers

Pseudocode of how you can do it:

if a>b
  return (a << 32) | b;
else 
  return (b << 32) | a;

This satisfies hash(a,b) == hash(b,a), utilizes the full 64 bit space, and shouldn't have collisions ...I think :)

Be careful to not directly shift the 32bit variables. Use intermediate 64-bit buffers or inline casts instead:

uint64_t myhash(uint32_t a, uint32_t b)
{
    uint64 a64 = (uint64_t) a;
    uint64 b64 = (uint64_t) b;
    return (a > b) ? ((a64 << 32) | b64) : ((b64 << 32) | a64);
}
like image 148
hatchet - done with SOverflow Avatar answered Nov 17 '22 08:11

hatchet - done with SOverflow


#define MYHASH(a,b) ( (((UINT64) max(a,b)) << 32) | ((UINT64) min(a,b)) )
like image 44
Claudix Avatar answered Nov 17 '22 09:11

Claudix


((a | b) << 32) + (a & b)

is commutative and should lead to a minimum number of collisions. I have to think more about it though ...

like image 30
log0 Avatar answered Nov 17 '22 08:11

log0


How about ((uint64_t)max(a, b) << UINT64_C(32)) | (uint64_t)min(a, b))? This would avoid collisions entirely, as there is no possible overlap between inputs. I can't speak to the distribution though, as that depends on your input values.

like image 1
Jonathan Grynspan Avatar answered Nov 17 '22 09:11

Jonathan Grynspan


(a ^ b) | ((a ^ ~b) <<32);

like image 1
wildplasser Avatar answered Nov 17 '22 08:11

wildplasser