Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding key construction for std::map::find()

Tags:

Suppose I've got a std::map<std::string, int>. std::string can be compared to C strings (const char*) without std::string temporaries. However, map::find() appears to force me to construct a temporary std::string, which probably requires a memory allocation. How do I avoid this? Conceptually it's easy, but the STL appears to prevent this.

#include <map>  int main() {     std::map<std::string, int> m;     m.find("Olaf"); } 
like image 398
XTF Avatar asked May 10 '12 14:05

XTF


People also ask

What does std::map return if key not found?

std::map operator[] inserts the default constructed value type in to the map if the key provided for the lookup doesn't exist. So you will get an empty string as the result of the lookup.

How do you find if a map contains a key in C++?

To check for the existence of a particular key in the map, the standard solution is to use the public member function find() of the ordered or the unordered map container, which returns an iterator to the key-value pair if the specified key is found, or iterator to the end of the container if the specified key is not ...

What does map return if key does not exist C++?

Here map size will increase by 1 . To search a key you can use map_name. find() , which will return map_name. end() if the key doesn't exist.

What does map Find return if nothing is found C++?

map find() function in C++ STL If the key is not present in the map container, it returns an iterator or a constant iterator which refers to map. end().


2 Answers

Your concern is real, and there's no good workaround for C++11.

C++14 fixes this issue by adding a templated overload of std::map::find — the relevant proposal is N3657. In C++14, your program would look like this:

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <map> #include <algorithm>  class std_string {     char *m_s; public:     std_string() { m_s = nullptr; }     std_string(const char* s) { puts("Oops! A new std_string was constructed!"); m_s = strdup(s); }     ~std_string() { free(m_s); }     std_string(std_string&& ss) = delete;     std_string(const std_string& ss) = delete;     std_string& operator=(std_string&& ss) = delete;     std_string& operator=(const std_string& ss) = delete;      bool operator< (const char* s) const { return strcmp(m_s, s) < 0; }     bool operator< (const std_string& ss) const { return strcmp(m_s, ss.m_s) < 0; }     friend bool operator< (const char* s, const std_string& ss) { return strcmp(s, ss.m_s) < 0; } };  int main() {     {         puts("The C++11 way makes a copy...");         std::map<std_string, int> m;         auto it = m.find("Olaf");     }     {         puts("The C++14 way doesn't...");         std::map<std_string, int, std::less<>> m;         auto it = m.find("Olaf");     } } 

(std::less<> is the generalized "less-than" comparator, equivalent to operator<. C++03 and C++11 have a broken-by-design version of this comparator that forces both arguments to be the same type. C++14 finally does it right.)

Unfortunately the Committee seems to have decided that people should go through all their C++11 code and update every container to use std::less<> as the comparator — it doesn't just happen by default. There's no good reason for this decision; it's just The Way It Is. (See my comments above about broken-by-design. C++ has a bad habit of introducing broken versions of things before introducing the "real" versions several years later.)

For C++11, std::map::find has only one overload (the one that takes const Key&), so any workaround will necessarily involve changing the Key type to be less expensive — we can't just fiddle with the comparator, because by the time execution reaches the comparator, we've already promoted find's argument to the Key type.

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <map> #include <algorithm>  class std_string {     char *m_s; public:     std_string() : m_s(nullptr) { }     std_string(const char* s) : m_s(strdup(s)) { puts("Oops! A new std_string was constructed!"); }     ~std_string() { free(m_s); }     std_string(std_string&& ss) : m_s(nullptr) { std::swap(m_s, ss.m_s); }     std_string(const std_string& ss) : m_s(strdup(ss.data())) { puts("Oops! A new std_string was constructed!"); }     std_string& operator=(std_string&& ss) = delete;     std_string& operator=(const std_string& ss) = delete;      const char* data() const { return m_s; }      bool operator< (const char* s) const { return strcmp(m_s, s) < 0; }     bool operator< (const std_string& ss) const { return strcmp(m_s, ss.m_s) < 0; }     friend bool operator< (const char* s, const std_string& ss) { return strcmp(s, ss.m_s) < 0; } };  struct string_or_ptr {     union {         const char* ptr;         alignas(std_string) unsigned char str[sizeof (std_string)];     } m_u;     bool m_deep;      char const* & ptr() { return m_u.ptr; }     std_string& str() { return *reinterpret_cast<std_string*>(m_u.str); }     char const* const & ptr() const { return m_u.ptr; }     std_string const& str() const { return *reinterpret_cast<const std_string*>(m_u.str); }      string_or_ptr() : m_deep(false) { ptr() = ""; }     string_or_ptr(const char* s) : m_deep(false) { ptr() = s; }     string_or_ptr(std_string&& s) : m_deep(true) { new ((void*)&str()) std_string(std::move(s)); }     string_or_ptr(const std_string& s) : m_deep(true) { new ((void*)&str()) std_string(s); }     ~string_or_ptr() { if (m_deep) str().~std_string(); }     std_string& operator=(std_string&& ss) = delete;     std_string& operator=(const std_string& ss) = delete;       operator const char*() const { return m_deep ? str().data() : ptr(); }      bool operator< (const char* s) const { return strcmp((const char*)*this, s) < 0; }     bool operator< (const std_string& ss) const { return (const char*)*this < ss; }     bool operator< (const string_or_ptr& sp) const { return strcmp((const char*)*this, (const char*)sp) < 0; }     friend bool operator< (const char* s, const string_or_ptr& sp) { return strcmp(s, (const char*)sp) < 0; }     friend bool operator< (const std_string& ss, const string_or_ptr& sp) { return ss < (const char*)sp; } };  int main() {     {         puts("The C++11 way...");         std::map<std_string, int> m;         auto it = m.find("Olaf");     }     {         puts("The C++11 way with a custom string-or-pointer Key type...");         std::map<string_or_ptr, int> m;         auto it = m.find("Olaf");     } } 
like image 155
Quuxplusone Avatar answered Sep 21 '22 20:09

Quuxplusone


There isn't actually a way to force find to use a comparison operator different from the one used to create the map. If you could pass a different one into find, how could it guarantee that both comparisons would provide the same ordering?

Instead, just think about the cases:

1) You're passing around char* in your program. In this case, just don't do that. Use std::string instead, creating it one time if needed, as close to the origination as possible. Then no conversion is needed.

2) You're trying to find string literals. In this case, why is the key a string? Make the key be a nicely named enumerated type instead:

enum names { OLAF }; map<names, int> m; m.find(OLAF); 

3) You want to find both strings and C-string literals. In this case, I would create a global lookup table of strings indexed by an enumeration but built once at the start of main. Then you would call something like m.find(global_strings[OLAF]);

EDIT: You seem very focused/concerned about the performance implications of string here. Have you profiled your application and found that string's allocations are a significant portion of your app's time? I would of course believe this on embedded systems/devices.

Additionally, you've tagged your question C++ but you seem to be outright refusing to use C++'s built in string feature which does far more than "cost you performance". It provides a variety of helpful functions/methods/operators but most importantly it manages the memory for you so you don't spend days or weeks hunting down those truly insidious bugs that no doubt will crop up.

If you're reading variable length data from the network I can't quite grasp the performance difference between char* buffer = new char[needed_size]; and something like std::string s; s.resize(needed_size); other than that using string provides some safety and memory management for you.

like image 26
Mark B Avatar answered Sep 22 '22 20:09

Mark B