Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defining a c++20 concept for hash functions

The CppCoreGuidelines state that concepts should be specified for all template arguments. (see:T.10: Specify concepts for all template arguments) As practice for defining concepts, I am trying to build a hashtable with concepts defined for the Hash function and Key template arguments.

I want my hashtable to use two template arguments, HashFunc and Key. HashFunc should be a function object and Key should be the argument for the function object HashFunc.

That is, HashFunc(Key) should return a type convertible to size_t.

On cppreference, there is an example defining the concept Hashable. I replicated the example below:

template<typename T>
concept Hashable = requires(T a) {
    { std::hash<T>{}(a) } -> std::convertible_to<std::size_t>;
};

This Hashable concept makes sense for many uses. In these cases, hash functions on object of type T are specializations std::hash<T>. However, for my purposes, I don't want to assume that the Hash will be std::hash<Key>. I would like the user to be able to provide a different hash function.

Since HashFunc and Key are so tightly bound, I don't think I can define separate concepts for HashFunc and Key. Is that correct? So I would like to define a concept HashConcept that deals with HashFunc and Key simultaneously.

So I define one concept Hash that deals with both. I try my best to define the concept so that it matches the named requirement for Hash here. The goal then is to satisfy 4 conditions. Below this list, I talk about trying to enforce these conditions.

  1. The return type is convertible to std::size_t.
  2. The hash function displays equality preservation (h(k1) == h(k1) for the duration of the program. see C++ Extensions for Ranges section 19.1.1)
  3. If u is an lvalue Key, then h(u) does not modify u.
  4. "The probability of h(a)==h(b) for a!=b should approach 1.0/std::numeric_limits<std::size_t>::max()."

Does this list appear complete?

I don't believe concepts can enforce (4), and (4) will just need to be indicated in comments/documentation. I believe that concepts might be able to enforce (2) and (3), but I'm not sure how. C++ Extensions for Ranges section 19.5 defines the concepts Callable and RegularCallable, then says "Note: The distinction between Callable and RegularCallable is purely semantic. — end note", suggesting that (2) cannot be enforced. That leaves (1) and (3).

I define a concept that enforces (1).

template<typename HashFunc, typename Key>
concept Hash = requires(HashFunc h, Key k) {
    { std::invoke(h, k) } -> std::convertible_to<std::size_t>;
};

Is this concept correct? (e.g., should I have used requires or returned a bool?) Can my concept be extended to address other requirements for hash functions, such as (2)-(4)?

Below is some example code that uses the concept. The result is to print 3 to the std::cout.

#include <functional>
#include <concepts>
#include <iostream>

template<typename HashFunc, typename Key>
concept HashConcept = requires(HashFunc h, Key k) {
    { std::invoke(h, k) } -> std::convertible_to<std::size_t>;
};


class HashFunc {
public:
    std::size_t operator()(int i) {
        return static_cast<size_t>(i);
    }
};

template<typename Hash, typename Key>
    requires HashConcept<Hash, Key>
size_t HashConceptUser(Hash h, Key k) {
    return h(k);
}

int main() {
    std::cout << HashConceptUser< HashFunc, int >(HashFunc{}, 3); 

}
like image 990
mana Avatar asked Dec 03 '20 14:12

mana


People also ask

How do you write a hash function?

With modular hashing, the hash function is simply h(k) = k mod m for some m (usually, the number of buckets). The value k is an integer hash code generated from the key. If m is a power of two (i.e., m=2p), then h(k) is just the p lowest-order bits of k.

Which of the below is rules for choosing hash function?

Choosing a good hashing function, h(k), is essential for hash-table based searching. h should distribute the elements of our collection as uniformly as possible to the "slots" of the hash table. The key criterion is that there should be a minimum number of collisions. will provide uniform hashing.

What is Hashtable C++?

A hash table is a data structure which is used to store key-value pairs. Hash function is used by hash table to compute an index into an array in which an element will be inserted or searched. This is a C++ program to Implement Hash Tables.

What is a hash function in C?

The hash function is a function that uses the constant-time operation to store and retrieve the value from the hash table, which is applied on the keys as integers and this is used as the address for values in the hash table. Types of a Hash Function In C The types of hash functions are explained below: 1.

What is Hashhash and how does it work?

Hash functions are also referred to as hashing algorithms or message digest functions. They are used across many areas of computer science, for example: To encrypt communication between web servers and browsers, and generate session ID s for internet applications and data caching

What is a hash value?

A hash value is the output string generated by a hash function. No matter the input, all of the output strings generated by a particular hash function are of the same length. The length is defined by the type of hashing technology used. The output strings are created from a set of authorized characters defined in the hash function.

How to calculate 2 hash functions to resolve collision problem?

This method we have to calculate 2 hash functions to resolve the collision problem. The first is calculated using a simple division method. Second has to satisfy two rules; it must not be equal to 0 and entries must be probed. 1 (key) = key % size of the table.


1 Answers

Does this list appear complete?

The list is missing arguably the single most important criteria for a hash function: that if a == b then h(a) == h(b).

The 4th criterion on the list is something you want for good hash functions, and is itself somewhat incomplete - you don't just want the likelihood of collision to be small, you also want random dispersion. The hash function h(i) = i satisfies the 4th criterion, but not a good hash function. On the flip side, h(i) = 0 is a terrible hash function but should be considered valid.


That said, C++ language concepts cannot enforce any of these things - you cannot enforce that the hash function is equality-preserving, you cannot enforce that it doesn't modify its inputs, and you cannot enforce anything about the distribution of its results. Those are what we would call semantic constraints rather than syntactic ones (the C++ standard speaks of satsifying a concept if the syntactic constraints are met and modeling a concept if the syntactic and semantic ones are met). The semantic constraints are documented requirements (in comments, or just documentation) rather than coded ones.

The best you can do the syntax is just verify that the hash function is invocable and gives you an integer:

template <typename F, typename T>
concept HashFor = std::regular_invocable<F, T>
               && std::convertible_to<std::invoke_result_t<F, T>, size_t>;

I am using regular_invocable here because that concept adds semantic constraints that you want: that the function call is equality-preserving and does not modify the function object or its arguments. You could also write it this way:

template <typename F, typename T>
concept HashFor = std::regular_invocable<F, T>
    && requires(F f, T t) {
        { std::invoke(f, t) } -> std::convertible_to<size_t>;
    };

But I would keep the regular_invocable part.

like image 100
Barry Avatar answered Oct 19 '22 21:10

Barry