I have a C++ application that uses an STL set to keep a list of strings (ordered and unique).
The problem I have is that underscores are being ordered in the opposite way that I need them to be.
Example STL order:
"word0"
"word_"
The order I need is:
"word_"
"word0"
I've started implementing a custom compare function to handle this issue, but I would rather use a solution provided within STL (if there is one).
Searching online, I've found some references to this exact same problem but in other systems, and the solution seems to be to change the Collation or the Locale, but I can't seem to find how to do that with STL.
There is no built-in solution to this particular problem, as the libraries expect you to build your own custom comparator to handle this.
However, you may want to look into defining your own char_traits
type, which would let you customize how strings are ordered and compared. While there aren't the best tutorials online about this, this is potentially the cleanest and easiest solution to your problem. As a shameless plug, I wrote an answer to this earlier question about char_traits
that might be useful for what you're doing.
I would suggest that you not mess around with locales. Locales are designed for localization and are designed to have large, profound effects on how text is handled. A custom comparator or a new char_traits
type more directly addresses the problem at hand.
Matt Austern wrote a paper on "How to do case-insensitive string comparison" that handles locales properly. It may contain the information on locales and facets that you're looking for.
Otherwise, if you're just looking to reverse the usual comparison order of a couple of characters, shouldn't using std::lexicographical_compare
with your own comparison function object do the job?
bool mycomp( char c1, char c2 )
{
// Return 0x5F < 0x30
if ( ( c1 == '_' ) && ( c2 == '0' ) )
return true;
if ( ( c1 == '0' ) && ( c2 == '_' ) )
return false;
return ( c1 < c2 );
}
std::string w1 = "word0";
std::string w2 = "word_";
bool t1 = std::lexicographical_compare( w1.begin(), w1.end(), w2.begin(), w2.end() );
bool t2 = std::lexicographical_compare( w1.begin(), w1.end(), w2.begin(), w2.end(), mycomp );
"word0"
evaluates less than "word_"
in the first case, and greater in the second, which is what you're after.
If you're already doing something similar, that's the easiest way to go.
Edit: On the subject of using char_traits
to accomplish this, Austern's article notes:
The Standard Library type
std::string
uses the traits parameter for all comparisons, so, by providing a traits parameter with equality and less than redefined appropriately, you can instantiate basic_string in such a way so that the<
and==
operators do what you need. You can do it, but it isn't worth the trouble.You won't be able to do I/O, at least not without a lot of pain. You won't be able use ordinary stream objects like
cin
andcout
.
He goes on to list several other good reasons why modifying char_traits
to perform this comparison isn't a good idea.
I highly recommend that you read Austern's paper.
You can use std::lexicographic_compare
with a custom predicate to compare strings with a custom character ordering – as Gnawme already said. The following code brings together the std::set
with the std::lexicographic_compare
.
#include <iostream>
#include <set>
#include <string>
#include <algorithm>
struct comp
{
static bool compchar(char a, char b)
{
if (a == '0' && b == '_' || a == '_' && b == '0')
return !(a < b);
else
return (a < b);
}
bool operator()(const std::string& a, const std::string& b) const
{
return std::lexicographical_compare(a.begin(), a.end(),
b.begin(), b.end(),
compchar);
}
};
int main()
{
std::set<std::string, comp> test;
test.insert("word0");
test.insert("word_");
for (std::set<std::string, comp>::const_iterator cit = test.begin();
cit != test.end(); ++cit)
std::cout << *cit << std::endl;
return 0;
}
There is a collate class and here is a brief explanations of facet usage in C++ with a few examples of how it can be used.
But you'd probably need to implement the actual logic yourself anyway.
And: "The string class in the Standard C++ Library does not provide any service for locale-sensitive string comparisons." Hence you'd also need to wrap the locale-usage in a separate comparison function.
So if an existing locale doesn't compare the strings the way you like, going this way looks a bit like an overkill.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With