Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a custom collator?

I am using the following code to use as function to sort a list of strings:

bool stringLessThan(const string& str1, const string& str2) 
{
   const collate<char>& col = use_facet<collate<char> >(locale()); // Use the global locale

   string s1(str1);
   string s2(str2);

   transform(s1.begin(), s1.end(), s1.begin(), ::tolower);
   transform(s2.begin(), s2.end(), s2.begin(), ::tolower);
   const char* pb1 = s1.data();
   const char* pb2 = s2.data();
   return (col.compare(pb1, pb1 + s1.size(), pb2, pb2 + s2.size()) < 0);
}

I am setting the global locale as:

locale::global(locale("pt_BR.UTF-8")); 

If I use the en_EN.UTF-8 locale, the words with accent in my language (portuguese-Brazil) will be in different order that I want. So I use pt_BR.UTF-8. But, the string "as" is before "a", and I want "a" and then "as".

The reason is that collator ignores the spaces, and strings like:

a pencil
an apple

will be considered as:

apencil
anapple

and if sorted, will appear in this order:

an apple
a pencil

but I want:

a pencil
an apple

I made this with Java and the solution was create a custom collator. But in c++ how can I handle with it?

like image 717
ViniciusArruda Avatar asked Feb 11 '26 02:02

ViniciusArruda


1 Answers

Try creating your own collator class or comparison function. While in Java the more idiomatic approach might be to do this through extension, in c++ and for your case I'd recommend using composition.

This simply means that your custom collator class would have a collator member that it would use to help it perform collation, as opposed to deriving from the collate class.

As for your rules for comparison, it seems that you will need to explicitly implement your own logic. If you don't want spaces to be ignored, perhaps you should tokenize your strings.

like image 140
tep Avatar answered Feb 13 '26 16:02

tep