Even "ch" appears as two letters, in czech its considered as one letter and its order in alphabet is after H letter (so correct order is a,b,c,d,e,f,g,h,ch,i,j (I skipped some national characters). But when I do substr (colname, 1, 1) on column containing words begining with ch Im getting only "C"
this sql: SELECT SUBSTRING(title, 1, 1) AS title_truncated FROM node node WHERE node.type in ('termin') GROUP BY title_truncated ORDER BY title_truncated ASC"
returns: A, B, C, D, E, F, G, H, I, J (so no ch).
btw database is using utf8_czech_ci
Ch is not a character in itself in Unicode, it is a digraph.
As such, it seems impossible for a database collation to properly map the difference. What @Ladislav says in the comment, and the user in this mySQL internals discussion, seems to support this.
You will probably have to work around this manually, e.g. in your example, using a IF clause that tests for the presence of "Ch", and returns two characters if that is the case.
Reference: utf8_czech_ci collation table (mySQL 6)
Even though ch is considered a single sorting "letter" in Czech, it isn't considered a single "character" in any other way. It is stored and printed as two characters whenever it is encountered.
The collation setting in MySQL affects how strings are sorted, trying to sort individual characters is not very meaningful in many languages. E.g. č comes after (IIRC) c but ně and ne are equivalent so word ordering depends on the following letters.
I don't understand the underlying problem that you are trying to solve but I think the easiet approach might be to avoid using substring and to sort by title and only output the first "letter" if it changes when you are processing the results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With