Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently hash the ip-address

Tags:

algorithm

hash

This is an interview question. I thought about some solution like multiway- hashing but could not find some thing elegant. Please suggest some good method.

Question: You have 10 million IP addresses. (IPv4 4 byte addresses). Create a hash function for these IP addresses.

Hint: Using the IP's themselves as a key is a bad idea because there will be a lot of wasted space

like image 412
Abhishek Gupta Avatar asked Nov 16 '13 07:11

Abhishek Gupta


1 Answers

Interesting, that such an interesting question did not have any interesting answer (sorry for tautology).

If you see it as a theoretical matter, then this link is what you need (there is even a superfast hash function written for you and ready to go):

http://www.kfki.hu/~kadlec/sw/netfilter/ct3/

Practical matter may be different. If your hash table is of reasonable size, you will have to handle collisions anyway (with linked lists). So ask yourself what use case will take place in the end? If your code will run within some secluded ecosystem, and the IP address is a-b-c-d, c and d are the most volatile numbers and d won't be null (assuming you don't handle networks), so a hash table of 64K buckets, and cd as a hash may well be satisfactory?

Another use case - TCP connection tracking where a client use ephemeral port that is assigned by kernel randomly (isn't it ideal for hashing?). The problem is the limited range: something like 32768-61000 which renders least significant byte more random than most significant byte. So you can XOR the most significant byte with the most volatile byte in IP address that can be zerro (c) and use it as a hash in your 64K table.

like image 113
wick Avatar answered Oct 20 '22 03:10

wick