I understand that this may be vague question, but I wonder what are real world cases when custom comparator is useful for hash containers in std. I understand it's usefulness in ordered containers, but for hash containers it seems a bit weird.
Reason for this is that hash value for elements that are equal according to comparator needs to be the same, and I believe that in most cases that actually means converting lookup/insert element to some common representation(it is faster and easier to implement).
For example:
So I wonder if someone can provide some real world examples of usefulness of customizing std::unordered_
template parameters so I can recognize those patterns in future code I write.
Note 1: "symmetry argument" (std::map
enables customization of a comparator so std::unordred_
should be customizable also) is something I considered and I do not think it is convincing.
Note 2: I mixed 2 kind of comparators (<
and ==
) in the post for brevity, I know that std::map
uses <
and std::unordered_map
uses ==
.
As per https://en.cppreference.com/w/cpp/container/unordered_set
Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its value. This allows fast access to individual elements, since once a hash is computed, it refers to the exact bucket the element is placed into.
So the hash function defines the bucket your element will end up in, but once the bucket is decided, in order to find the element, the operator ==
will be used.
Basically operator ==
is used to resolve hash collision, and hence, you need your hash function and your operator ==
to be consistent. Furthermore, if your operator operator ==
says that two elements are equal, the set will not allow a duplication.
For what concerns customization, I think that the idea of case-insensitive set
of string
s is a good one: given two strings you will need to provide a case-insensitive hash-function to allow the set
to determine the bucket it has to store the string in. Then you will need to provide a custom KeyEqual
to allow the set to actually retrieve the element.
A case I had to deal with, in the past, was a way to allow users to insert strings, keeping track of their order of insertion but avoiding duplicates. So, given a struct like:
struct keyword{
std::string value;
int sequenceCounter;
};
You want to detect duplicates according only to value
. One of the solutions I came up with was an unordered_set
with a custom comparator/hash function, that used only value
. This allowed me to check for the existence of a key before allowing insertion.
One interesting usage is to define memory efficient indexes (database sense of the term) for a given set of objects.
Let's say we have a program that has a collection of N
objects of this class:
struct Person {
// each object has a unique firstName/lastName pair
std::string firstName;
std::string lastName;
// each object has a unique ssn value
std::string socialSecurityNumber;
// each object has a unique email value
std::string email;
}
And we need to retrieve efficiently objects by the value of any unique property.
Time complexities are given assuming string comparisons are constant time (strings have limited length).
unordered_map
With a single map
indexed by a single key (ex: email
):
std::unordered_map<std::string,Person> indexedByEmail;
email
requires a traversal of the map: average O(N).email
value is duplicated. This could be avoided by using a single set
with custom hash & compare (see 3).unordered_map
, no custom hash & compareWith a map for each unique property, with default hash & comparisons:
std::unordered_map<std::pair<std::string,std::string>, Person> byName;
std::unordered_map<std::string, const Person*> byEmail;
std::unordered_map<std::string, const Person*> bySSN;
string
duplications.unordered_set
, custom hash & comparison:With custom hash & comparison, we define different unordered_set
which will hash & compare only specific fields of the objects. Theses sets can be used to perform lookup as if items were stored in a map
as in 2, but without duplicating any field.
using StrHash = std::hash<std::string>;
// --------------------
struct PersonNameHash {
std::size_t operator()(const Person& p) const {
// not the best hashing function in the world, but good enough for demo purposes.
return StrHash()(p.firstName) + StrHash()(p.lastName);
}
};
struct PersonNameEqual {
bool operator()(const Person& p1, const Person& p2) const {
return (p1.firstName == p2.firstName) && (p1.lastName == p2.lastName);
}
};
std::unordered_set<Person, PersonNameHash, PersonNameEqual> byName;
// --------------------
struct PersonSsnHash {
std::size_t operator()(const Person* p) const {
return StrHash()(p->socialSecurityNumber);
}
};
struct PersonSsnEqual {
bool operator()(const Person* p1, const Person* p2) const {
return p1->socialSecurityNumber == p2->socialSecurityNumber;
}
};
std::unordered_set<const Person*, PersonSsnHash, PersonSsnEqual> bySSN;
// --------------------
struct PersonEmailHash {
std::size_t operator()(const Person* p) const {
return StrHash()(p->email);
}
};
struct PersonEmailEqual {
bool operator()(const Person* p1, const Person* p2) const {
return p1->email == p2->email;
}
};
std::unordered_set<const Person*,PersonEmailHash,PersonEmailEqual> byEmail;
string
duplication.Live demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With