Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unordered set (const char) much slower than unordered set (string)

I'm loading a very long list from disk into an unordered_set. If I use a set of strings, it is very fast. A test list of about 7 MB loads in about 1 second. However, using a set of char pointers takes about 2.1 minutes!

Here is the code for the string version:

unordered_set<string> Set;
string key;
while (getline(fin, key))
{
    Set.insert(key);
}

Here is the code for the char* version:

struct unordered_eqstr
{
    bool operator()(const char* s1, const char* s2) const
    {
        return strcmp(s1, s2) == 0;
    }
};

struct unordered_deref
{
    template <typename T>
    size_t operator()(const T* p) const
    {
        return hash<T>()(*p);
    }
};

unordered_set<const char*, unordered_deref, unordered_eqstr> Set;
string key;

while (getline(fin, key))
{
    char* str = new(mem) char[key.size()+1];
    strcpy(str, key.c_str());
    Set.insert(str);
}

The "new(mem)" is because I'm using a custom memory manager so I can allocate big blocks of memory and give them out to tiny objects like c strings. However, I've tested this with regular "new" and the results are identical. I've also used my memory manager in other tools with no problems.

The two structs are necessary to make the insert and find hash based on the actual c string and not its address. The unordered_deref I actually found here on stack overflow.

Eventually I need to load multi-gigabyte files. This is why I'm using a custom memory manager, but it's also why this horrible slow down is unacceptable. Any ideas?

like image 808
Matthew Alpert Avatar asked Oct 11 '22 15:10

Matthew Alpert


1 Answers

Here we go.

struct unordered_deref
{
    size_t operator()(const char* p) const
    {
        return hash<string>()(p);
    }
};
like image 175
Matthew Alpert Avatar answered Oct 14 '22 02:10

Matthew Alpert