I have C++ code that investigates a BIG string and matches lots of substrings. As much as possible, I avoid constructing std::strings, by encoding substrings like this:
char* buffer, size_t bufferSize
At some point, however, I'd like to look up a substring in one of these:
std::unordered_map<std::string, Info> stringToInfo = {...
So, to do that, I go:
stringToInfo.find(std::string(buffer, bufferSize))
That constructs a std::string for the sole purpose of the lookup.
I feel like there's an optimization I could do here, by... changing the key-type of the unordered_map to some kind of temporary string imposter, a class like this...
class SubString
{
char* buffer;
size_t bufferSize;
// ...
};
... that does the same logic as std::string to hash and compare, but then doesn't deallocate its buffer when it's destroyed.
So, my question is: is there a way to get the standard classes to do this, or do I write this class myself?
What you're wanting to do is called heterogeneous lookup. Since C++14 it's been supported for std::map::find
and std::set::find
(note versions (3) and (4) of the functions, which are templated on the lookup value type). It's more complicated for unordered containers because they need to be told of or find hash functions for all key types that will produce the same hash value for the same text. There's a proposal under consideration for a future Standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0919r0.html
Meanwhile, you could use another library that already supports heterogenous lookup, e.g. boost::unordered_map::find
.
If you want to stick to std::unordered_map
, you could avoid creating so many string temporaries by storing a std::string
member alongside your unordered_map
that you can reassign values to, then pass that string
to find
. You could encapsulate this in a custom container class.
Another route is to write a custom class to use as your unordered container key:
struct CharPtrOrString
{
const char* p_;
std::string s_;
explicit CharPtrOrString(const char* p) : p_{p} { }
CharPtrOrString(std::string s) : p_{nullptr}, s_{std::move(s)} { }
bool operator==(const CharPtrOrString& x) const
{
return p_ ? x.p_ ? std::strcmp(p_, x.p_) == 0
: p_ == x.s_
: x.p_ ? s_ == x.p_
: s_ == x.s_;
}
struct Hash
{
size_t operator()(const CharPtrOrString& x) const
{
std::string_view sv{x.p_ ? x.p_ : x.s_.c_str()};
return std::hash<std::string_view>()(sv);
}
};
};
You can then construct CharPtrOrString
from std::string
s for use in the unordered container keys, but construct one cheaply from your const char*
each time you call find
. Note that operator==
above has to work out which you did (convention used is that if the pointer's nullptr
then the std::string
member's in use) so it compares the in-use members. The hash function has to make sure a std::string
with a particular textual value will produce the same hash as a const char*
(which it doesn't by default with GCC 7.3 and/or Clang 6 - I work with both and remember one had an issue but not which).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With