Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::string_view and std::string in std::unordered_set [duplicate]

Let's say you have an std::unordered_set<std::string> .

You have an std::string_view object that you want to search for in the container. Problem is, you don't want to create a std::string from your std::string_view, as this kind of defeats the purpose of using std::string_view in the first place.

However, it seems that std::string_view should be usable as a key; there should be some way to compare std::string_view and std::string , as they both basically represent the same thing. But there isn't, not in the STL anyway.

This is an impasse, am I forced to write my own comparison object for std::string_view and std::string objects to use with my std::unordered_set ?

edit: This question is specific to string_view objects. The 'duplicate' question is not relevant. I received a unique answer to a unique question, as expected.

like image 845
Adam Avatar asked Aug 08 '18 14:08

Adam


1 Answers

I don't have a nice solution, but a possible workaround with minimal custom code, at the expense of increased memory usage, would be to replace your std::unordered_set<std::string> with a std::unordered_map that has views for keys, and strings for values (which back the views).

Unfortunately, thanks to the small string optimization, we can't rely on std::move preserving the original address of the underlying string's data, so something like:

std::string to_insert(...);
mymap.try_emplace(to_insert, std::move(to_insert));

isn't going to work.

Instead, it would have to be a std::unordered_map<std::string_view, std::unique_ptr<std::string>> so we could preserve the unique address of the string's characters, making the code more like:

auto to_insert = std::make_unique<std::string>(...);
mymap.try_emplace(*to_insert, std::move(to_insert));

While insertion would be kinda ugly, simple membership testing would remain simple, since std::string defines an implicit operator std::string_view, and std::string_view has an implicit constructor for char*, so membership testing remains a simple:

if (mymap.count(some_string)) { ... }

whether some_string is a char*, std::string_view or std::string.

Note: I'm not going to swear the two-liner try_emplace-based insertion code is legal, as I'm a bit out of practice on C++, and quite leery about using a unique_ptr in the same expression I'm moveing from it; on g++ 7.2 it seems to work, and I think the fact that the key argument to try_emplace is constructed immediately, while the arguments to construct the value are forwarded makes this safe, but I'll acknowledge that my understanding of C++ evaluation order (or lack thereof) is not perfect. If I'm doing something illegal, not just ugly, then fixing it would require the slightly uglier (but definitely sequenced):

auto to_insert = std::make_unique<std::string>(...);
std::string_view key{*to_insert};
mymap.try_emplace(std::move(key), std::move(to_insert));

Additional note: Only the emplace/emplace_hint/try_emplace functions can be safely used to update the entries in mymap in this design. Using mymap[key] = std::move(to_insert); or insert_or_assign breaks if the same key is encountered twice while building the map, as the original string_view (referencing the original string's data) would be preserved, while the value would be replaced with the new string, invalidating the string_view's pointer. While insert doesn't replace values, I believe using it would require a design more like the three-liner with try_emplace, as making the std::pair to insert would be unsequenced if you tried to construct both view and unique_ptr as part of pair construction.

like image 200
ShadowRanger Avatar answered Nov 17 '22 09:11

ShadowRanger