Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perfect hash function for a set of integers with no updates

In one of the applications I work on, it is necessary to have a function like this:

bool IsInList(int iTest)
{
   //Return if iTest appears in a set of numbers.
}

The number list is known at app load up (But are not always the same between two instances of the same application) and will not change (or added to) throughout the whole of the program. The integers themselves maybe large and have a large range so it is not efficient to have a vector<bool>. Performance is a issue as the function sits in a hot spot. I have heard about Perfect hashing but could not find out any good advice. Any pointers would be helpful. Thanks.

p.s. I'd ideally like if the solution isn't a third party library because I can't use them here. Something simple enough to be understood and manually implemented would be great if it were possible.

like image 481
nakiya Avatar asked Nov 19 '10 13:11

nakiya


1 Answers

I would suggest using Bloom Filters in conjunction with a simple std::map.

Unfortunately the bloom filter is not part of the standard library, so you'll have to implement it yourself. However it turns out to be quite a simple structure!

A Bloom Filter is a data structure that is specialized in the question: Is this element part of the set, but does so with an incredibly tight memory requirement, and quite fast too.

The slight catch is that the answer is... special: Is this element part of the set ?

  • No
  • Maybe (with a given probability depending on the properties of the Bloom Filter)

This looks strange until you look at the implementation, and it may require some tuning (there are several properties) to lower the probability but...

What is really interesting for you, is that for all the cases it answers No, you have the guarantee that it isn't part of the set.

As such a Bloom Filter is ideal as a doorman for a Binary Tree or a Hash Map. Carefully tuned it will only let very few false positive pass. For example, gcc uses one.

like image 185
Matthieu M. Avatar answered Sep 26 '22 02:09

Matthieu M.