Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case-Insensitive STL Containers (e.g. std::unordered_set)

What is the shortest, most cross-platform way to make a std::unordered_set CASE-INSENSITIVE container?

my_set.insert("Apples");  
my_set.insert("apples"); //Insert doesn't occur because of duplicate item

I know STL provides Hash and Pred. What should Hash be? What should Pred be? if they are not-builtins then please provide the code for them along with an example of their use (i.e. how do I declare std::unordered_set?).

Due to the Criticism I will elaborate on what I am trying to do. I need a high performance transparent HTTP proxy server, one of the things it does is looks up HTTP header fields quickly. HTTP header fields are defined as being case-insensitive so I need a case-insensitive container.

like image 765
unixman83 Avatar asked Dec 25 '11 00:12

unixman83


People also ask

What is std :: Unordered_set?

Unordered set is an associative container that contains a set of unique objects of type Key. Search, insertion, and removal have average constant-time complexity. Internally, the elements are not sorted in any particular order, but organized into buckets.

Is std :: map case sensitive?

If you use the standard std::map associative container with std::string or std::wstring as key types, you get a case sensitive comparison by default.

How is std :: Unordered_set implemented?

An unordered_set is implemented using a hash table where keys are hashed into indices of a hash table so that the insertion is always randomized.


2 Answers

The definition of unordered_set is

  template <class Value,
            class Hash = hash<Value>,
            class Pred = std::equal_to<Value>,
            class Alloc = std::allocator<Value> >
  class unordered_set;

If you provide Hash and Pred functors that are case-insensitive, then the set will become so too.

This is a simple example, the string hash function is simplistic but you can change it to your needs

struct MyHash
{
    size_t operator()(const std::string& Keyval) const
    {
        //You might need a better hash function than this
        size_t h = 0;
        std::for_each( Keyval.begin() , Keyval.end() , [&](char c )
        {
            h += tolower(c);
        });
        return h;
    }
};

struct MyEqual
{
    bool operator()(const std::string& Left, const std::string& Right) const
    {
        return Left.size() == Right.size() 
             && std::equal ( Left.begin() , Left.end() , Right.begin() ,
            []( char a , char b )
        {
            return tolower(a) == tolower(b); 
        }
        );
    }
};


int main()
{
    std::unordered_set< std::string , MyHash , MyEqual > m;

    m.insert( "Apple" );
    m.insert( "apple" );

    return 0;
}
like image 134
parapura rajkumar Avatar answered Oct 12 '22 01:10

parapura rajkumar


Personally i would define a value type that was case insensitive and that devolves into a string at the merest hint. Thus allowing me to use the standard hash and predicate models.

#include <string>
#include <unordered_set>
#include <iostream>
#include <algorithm>
#include <iterator>

class LCString
{
    std::string  data;

    public:
        operator std::string&()             {return data;}
        operator std::string const&() const {return data;}

        LCString(char const* init)
        {
            std::transform(init, init + strlen(init),
                           std::back_inserter(data), &::tolower);
        }
};

int main()
{
    typedef std::unordered_set<LCString, 
                               std::hash<std::string>, 
                               std::equal_to<std::string> >  MySet;
    MySet  data;
    data.insert("Apples");
    data.insert("apples");

    std::copy(data.begin(), data.end(),
              std::ostream_iterator<std::string>(std::cout, " - "));
    std::cout << "\n";
}

Thus we only put lowercase values into the set:

> g++ pl.cpp
> ./a.out
apples -
>

Edit Case Preserving:

class LCStringOriginalPreserved
{
    std::string  original;
    std::string  data;

    public:
        operator std::string&()             {return data;}
        operator std::string const&() const {return data;}

        std::string& getOriginal()          {return original;}

        LCString(char const* init)
          : original(init)
        {
            std::transform(original.begin(), original.end(),
                           std::back_inserter(data), &::tolower);
        }
};
like image 21
Martin York Avatar answered Oct 12 '22 00:10

Martin York