Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ unordered_map<string, ...> lookup without constructing string

I have C++ code that investigates a BIG string and matches lots of substrings. As much as possible, I avoid constructing std::strings, by encoding substrings like this:

char* buffer, size_t bufferSize

At some point, however, I'd like to look up a substring in one of these:

std::unordered_map<std::string, Info> stringToInfo = {...

So, to do that, I go:

stringToInfo.find(std::string(buffer, bufferSize))

That constructs a std::string for the sole purpose of the lookup.

I feel like there's an optimization I could do here, by... changing the key-type of the unordered_map to some kind of temporary string imposter, a class like this...

class SubString
{
    char* buffer;
    size_t bufferSize;

    // ...
};

... that does the same logic as std::string to hash and compare, but then doesn't deallocate its buffer when it's destroyed.

So, my question is: is there a way to get the standard classes to do this, or do I write this class myself?

like image 831
2-complex Avatar asked Apr 07 '18 16:04

2-complex


1 Answers

What you're wanting to do is called heterogeneous lookup. Since C++14 it's been supported for std::map::find and std::set::find (note versions (3) and (4) of the functions, which are templated on the lookup value type). It's more complicated for unordered containers because they need to be told of or find hash functions for all key types that will produce the same hash value for the same text. There's a proposal under consideration for a future Standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0919r0.html

Meanwhile, you could use another library that already supports heterogenous lookup, e.g. boost::unordered_map::find.

If you want to stick to std::unordered_map, you could avoid creating so many string temporaries by storing a std::string member alongside your unordered_map that you can reassign values to, then pass that string to find. You could encapsulate this in a custom container class.

Another route is to write a custom class to use as your unordered container key:

struct CharPtrOrString
{
    const char* p_;
    std::string s_;

    explicit CharPtrOrString(const char* p) : p_{p} { }
    CharPtrOrString(std::string s) : p_{nullptr}, s_{std::move(s)} { }

    bool operator==(const CharPtrOrString& x) const
    {
        return p_ ? x.p_ ? std::strcmp(p_, x.p_) == 0
                         : p_ == x.s_
                  : x.p_ ? s_ == x.p_
                         : s_ == x.s_;
    }

    struct Hash
    {
        size_t operator()(const CharPtrOrString& x) const
        {
            std::string_view sv{x.p_ ? x.p_ : x.s_.c_str()};
            return std::hash<std::string_view>()(sv);
        } 
    };
};

You can then construct CharPtrOrString from std::strings for use in the unordered container keys, but construct one cheaply from your const char* each time you call find. Note that operator== above has to work out which you did (convention used is that if the pointer's nullptr then the std::string member's in use) so it compares the in-use members. The hash function has to make sure a std::string with a particular textual value will produce the same hash as a const char* (which it doesn't by default with GCC 7.3 and/or Clang 6 - I work with both and remember one had an issue but not which).

like image 79
Tony Delroy Avatar answered Oct 24 '22 09:10

Tony Delroy