I'm loading a very long list from disk into an unordered_set. If I use a set of strings, it is very fast. A test list of about 7 MB loads in about 1 second. However, using a set of char pointers takes about 2.1 minutes!
Here is the code for the string version:
unordered_set<string> Set;
string key;
while (getline(fin, key))
{
Set.insert(key);
}
Here is the code for the char* version:
struct unordered_eqstr
{
bool operator()(const char* s1, const char* s2) const
{
return strcmp(s1, s2) == 0;
}
};
struct unordered_deref
{
template <typename T>
size_t operator()(const T* p) const
{
return hash<T>()(*p);
}
};
unordered_set<const char*, unordered_deref, unordered_eqstr> Set;
string key;
while (getline(fin, key))
{
char* str = new(mem) char[key.size()+1];
strcpy(str, key.c_str());
Set.insert(str);
}
The "new(mem)" is because I'm using a custom memory manager so I can allocate big blocks of memory and give them out to tiny objects like c strings. However, I've tested this with regular "new" and the results are identical. I've also used my memory manager in other tools with no problems.
The two structs are necessary to make the insert and find hash based on the actual c string and not its address. The unordered_deref I actually found here on stack overflow.
Eventually I need to load multi-gigabyte files. This is why I'm using a custom memory manager, but it's also why this horrible slow down is unacceptable. Any ideas?
Here we go.
struct unordered_deref
{
size_t operator()(const char* p) const
{
return hash<string>()(p);
}
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With