My std::strings are encoded in UTF-8 so the std::string < operator doesn't cut it. How could I compare 2 utf-8 encoded std::strings?
where it does not cut it is for accents, é comes after z which it should not
Thanks
If you don't want a lexicographic ordering (which is what sorting the UTF-8 encoded strings lexicographically will give you), then you will need to decode your UTF-8 encoded strings into UCS-2 or UCS-4 as appropriate, and apply a suitable comparison function of your choosing.
To reiterate the point, the UTF-8 encoding mechanism is cleverly designed so that if you sort by looking at the numeric value of each 8-bit encoded byte, you will get the same result as if you first decoded the string into Unicode and compared the numeric values of each code point.
Update: Your updated question indicates that you want a more complex comparison function than purely a lexicographic sort. You will need to decode your UTF-8 strings and compare the decoded characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With