Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make std's data-structure use my existing non-static hash function "hashCode()" by default

I have a moderate-size codebase (>200 .cpp) that use a function hashCode() to return hash number:-

class B01{  //a class
    //..... complex thing ....
    public: size_t hashCode(){ /* hash algorithm #H01 */}  
};
class B02{  //just another unrelated class
    //..... complex thing ....
    public: size_t hashCode(){/* #H02 */}  //This is the same name as above
};

I have used it in various locations, e.g. in my custom data-structure. It works well.

Now, I want to make the hash algorithm recognized by std:: data structure:-

Here is what I should do :- (modified from cppreference, I will call this code #D).

//#D
namespace std {
    template<> struct hash<B01> {
        std::size_t operator()(const B01& b) const {
            /* hash algorithm #H01 */
        }
    };
}

If I insert the block #D (with appropriate implementation) in every class (B01,B02,...), I can call :-

std::unordered_set<B01> b01s;
std::unordered_set<B02> b02s;

without passing the second template argument,
and my hash algorithm (#H01) will be called. (by default)

Question

To make it recognize all of my B01::hashCode, B02::hashCode, ...,
do I have to insert the block #D into all 200+ Bxx.h?

Can I just add a single block #D (in a top header?)
and, from there, re-route std::anyDataStructure to call hashCode() whenever possible?

//pseudo code
namespace std{
    template<> struct hash<X>   {
        std::size_t operator()(const X& x) const { // std::enable_if??
            if(X has hashCode()){    //e.g. T=B01 or B02       
                make this template highest priority   //how?
                return hashCode();
            }else{                   //e.g. T=std::string
                don't match this template;  
            }
        }
    };
}

It sounds like a SFINAE question to me.

Side note: The most similar question in SO didn't ask about how to achieve this.

Edit (Why don't I just refactor it? ; 3 Feb 2017)

  • I don't know if brute force refactoring is a right path. I guess there might be a better way.
  • hashCode() is my home. I emotionally attach to it.
  • I want to keep my code short and clean as possible. std:: blocks are dirty.
  • It may be just my curiosity. If I stubborn not to refactor my code, how far C++ can go?
like image 445
javaLover Avatar asked Jan 26 '17 04:01

javaLover


3 Answers

It doesn't have to be that way, you can also have a functor:

struct MyHash {
    template <class T>
    auto hashCode(const T & t, int) const -> decltype(t.hashCode()) {
        return t.hashCode();
    }
    template <class T>
    auto hashCode(const T & t, long) const -> decltype(std::hash<T>{}(t)) {
        return std::hash<T>{}(t);
    }
    
    template <class T>
    auto operator()(const T & t) const -> decltype(hashCode(t,42)) {
        return hashCode(t,42);
    }
};

And have an alias of std::unordered_set with MyHash as hash type:

template <class Key>
using my_unordered_set = std::unordered_set<Key, MyHash>;

or more complete if you also want to be able to provide Equal functor and allocator:

template<
    class Key,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator<Key>
>
using my_unordered_set = std::unordered_set<Key, MyHash, KeyEqual, Allocator>;

Then using it (with any of your Bxx) like you'd use std::unordered_set:

int main() {
    my_unordered_set<B01> b01s;
    my_unordered_set<B02> b02s;

    // or lonely with your type:
    B01 b01{/*...*/};
    std::cout << MyHash{}(b01) << std::endl;

    // or any other:
    std::string str{"Hello World!"};
    std::cout << MyHash{}(str) << std::endl;
}

Concepts

If you can use concepts, they can allow you to specialize std::hash class the way you want:

template <class T>
concept HashCodeConcept = requires(T const & t)
{
    {t.hashCode()} -> std::same_as<std::size_t>;
};

namespace std {
    template <HashCodeConcept T>
    struct hash<T> {
        std::size_t operator()(const T& t) const {
            return  t.hashCode();
        }
    };
}
like image 145
O'Neil Avatar answered Nov 08 '22 10:11

O'Neil


A SFINAE based method of the type you were looking for requires partial specialisation of std::hash. This could be done if your classes Bxx are templates (which is the case if they are derived from a CRTP base). For example (note fleshed out in edit)

#include <type_traits>
#include <unordered_set>
#include <iostream>

template<typename T = void>
struct B {
  B(int i) : x(i) {}
  std::size_t hashCode() const
  {
    std::cout<<"B::hashCode(): return "<<x<<std::endl;
    return x;
  }
  bool operator==(B const&b) const
  { return x==b.x; }
private:
  int x;
};

template<typename T,
         typename = decltype(std::declval<T>().hashCode())> 
using enable_if_has_hashCode = T;

namespace std {
  template<template<typename...> class T, typename... As> 
  struct hash<enable_if_has_hashCode<T<As...>>> 
  {
    std::size_t operator()(const T<As...>& x) const
    { return x.hashCode(); }
  };
  // the following would not work, as its not a partial specialisation
  //    (some compilers allow it, but clang correctly rejects it)
  // tempate<typename T>
  // struct hash<enable_if_hashCode<T>>
  // { /* ... */ }; 
}

int main()
{
  using B00 = B<void>;
  B00 b(42);
  std::unordered_set<B00> set;
  set.insert(b);
}

produces (using clang++ on MacOS)

B::hashvalue(): return 42

see also this related answer to a similar question of mine.

However, concepts are the way of the future to solve problems like this.

like image 7
Walter Avatar answered Nov 08 '22 12:11

Walter


While creating conditions to default the hash parameter of std container templates to member methods of groups of classes, one should avoid introducing new issues.

  • Redundancy
  • Portability problems
  • Arcane constructs

The classic object oriented approach may require a patterned edit of the 200+ classes to ensure they provide the basics of std::hash container use. Some options for group transformation are given below to provide the two needed methods.

  • A public hashCode() is defined in the concrete class where it is unique to that class or by inheritance if it follows a pattern common across classes.
  • A public operator==() is defined.

The Two Templates

These two templates will remove the redundancy and simplify the declaration as indicated.

template <typename T>
    struct HashStruct {
        std::size_t operator()(const T & t) const {
            return t.hashCode();
        } };
template <class T>
    using SetOfB = std::unordered_set<T, HashStruct<T>>;

Saving Integration Time

An example super-class:

class AbstractB {
    ...
    virtual std::size_t hashCode() const {
        return std::hash<std::string>{}(ms1)
                ^ std::hash<std::string>{}(ms2);
    } }

The following sed expression may save transformation time, assuming the code uses { inline. Similar expressions would work with Boost or using a scripting language like Python.

"s/^([ \t]*class +B[a-zA-Z0-9]+ *)(:?)(.*)$"
        + "/\1 \2 : public AbstractB, \3 [{]/"
        + "; s/ {2,}/ /g"
        + "; s/: ?:/:/g"

An AST based tool would be more reliable. This explains how to use clang capabilities for code transformation. There are new additions such as this Python controller of C++ code transformation.

Discussion

There are several options for where the hash algorithm can reside.

  • A method of a std container declaration's abstract class
  • A method of a concrete class (such as #H01 in the example)
  • A struct template (generally counterproductive and opaque)
  • The default std::hash

Here's a compilation unit that provides a clean demonstration of the classic of how one might accomplish the desired defaulting and the other three goals listed above while offering flexibility in where the hash algorithm is defined for any given class. Various features could be removed depending on the specific case.

#include <string>
#include <functional>
#include <unordered_set>

template <typename T>
    struct HashStructForPtrs {
        std::size_t operator()(const T tp) const {
            return tp->hashCode(); } };
template <class T>
    using SetOfBPtrs = std::unordered_set<T, HashStructForPtrs<T>>;

template <typename T>
    struct HashStruct {
        std::size_t operator()(const T & t) const {
            return t.hashCode(); } };
template <class T>
    using SetOfB = std::unordered_set<T, HashStruct<T>>;

class AbstractB {
    protected:
        std::string ms;
    public:
        virtual std::size_t hashCode() const {
            return std::hash<std::string>{}(ms); }
        // other option: virtual std::size_t hashCode() const = 0;
        bool operator==(const AbstractB & b) const {
            return ms == b.ms; } };

class B01 : public AbstractB {
    public:
        std::size_t hashCode() const {
            return std::hash<std::string>{}(ms) ^ 1; } };

class B02 : public AbstractB {
    public:
        std::size_t hashCode() const {
            return std::hash<std::string>{}(ms) ^ 2; } };

int main(int iArgs, char * args[]) {

    SetOfBPtrs<AbstractB *> setOfBPointers;
    setOfBPointers.insert(new B01());
    setOfBPointers.insert(new B02());

    SetOfB<B01> setOfB01;
    setOfB01.insert(B01());

    SetOfB<B02> setOfB02;
    setOfB02.insert(B02());

    return 0; };
like image 8
Douglas Daseeco Avatar answered Nov 08 '22 11:11

Douglas Daseeco