What is the shortest, most cross-platform way to make a std::unordered_set CASE-INSENSITIVE container?
my_set.insert("Apples");
my_set.insert("apples"); //Insert doesn't occur because of duplicate item
I know STL provides Hash and Pred. What should Hash be? What should Pred be? if they are not-builtins then please provide the code for them along with an example of their use (i.e. how do I declare std::unordered_set
?).
Due to the Criticism I will elaborate on what I am trying to do. I need a high performance transparent HTTP proxy server, one of the things it does is looks up HTTP header fields quickly. HTTP header fields are defined as being case-insensitive so I need a case-insensitive container.
Unordered set is an associative container that contains a set of unique objects of type Key. Search, insertion, and removal have average constant-time complexity. Internally, the elements are not sorted in any particular order, but organized into buckets.
If you use the standard std::map associative container with std::string or std::wstring as key types, you get a case sensitive comparison by default.
An unordered_set is implemented using a hash table where keys are hashed into indices of a hash table so that the insertion is always randomized.
The definition of unordered_set
is
template <class Value,
class Hash = hash<Value>,
class Pred = std::equal_to<Value>,
class Alloc = std::allocator<Value> >
class unordered_set;
If you provide Hash and Pred functors that are case-insensitive, then the set will become so too.
This is a simple example, the string hash function is simplistic but you can change it to your needs
struct MyHash
{
size_t operator()(const std::string& Keyval) const
{
//You might need a better hash function than this
size_t h = 0;
std::for_each( Keyval.begin() , Keyval.end() , [&](char c )
{
h += tolower(c);
});
return h;
}
};
struct MyEqual
{
bool operator()(const std::string& Left, const std::string& Right) const
{
return Left.size() == Right.size()
&& std::equal ( Left.begin() , Left.end() , Right.begin() ,
[]( char a , char b )
{
return tolower(a) == tolower(b);
}
);
}
};
int main()
{
std::unordered_set< std::string , MyHash , MyEqual > m;
m.insert( "Apple" );
m.insert( "apple" );
return 0;
}
Personally i would define a value type that was case insensitive and that devolves into a string at the merest hint. Thus allowing me to use the standard hash and predicate models.
#include <string>
#include <unordered_set>
#include <iostream>
#include <algorithm>
#include <iterator>
class LCString
{
std::string data;
public:
operator std::string&() {return data;}
operator std::string const&() const {return data;}
LCString(char const* init)
{
std::transform(init, init + strlen(init),
std::back_inserter(data), &::tolower);
}
};
int main()
{
typedef std::unordered_set<LCString,
std::hash<std::string>,
std::equal_to<std::string> > MySet;
MySet data;
data.insert("Apples");
data.insert("apples");
std::copy(data.begin(), data.end(),
std::ostream_iterator<std::string>(std::cout, " - "));
std::cout << "\n";
}
Thus we only put lowercase values into the set:
> g++ pl.cpp
> ./a.out
apples -
>
class LCStringOriginalPreserved
{
std::string original;
std::string data;
public:
operator std::string&() {return data;}
operator std::string const&() const {return data;}
std::string& getOriginal() {return original;}
LCString(char const* init)
: original(init)
{
std::transform(original.begin(), original.end(),
std::back_inserter(data), &::tolower);
}
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With