Why is the length
function saying that this 8 character string is 9 characters?
>>> length "Níðhöggr"
9
"Níðhöggr" contains 9 Unicode characters:
U+004E N (Lu): LATIN CAPITAL LETTER N
U+00ED í (Ll): LATIN SMALL LETTER I WITH ACUTE
U+00F0 ð (Ll): LATIN SMALL LETTER ETH
U+0068 h (Ll): LATIN SMALL LETTER H
U+006F o (Ll): LATIN SMALL LETTER O
U+0308 ̈ (Mn): COMBINING DIAERESIS
U+0067 g (Ll): LATIN SMALL LETTER G
U+0067 g (Ll): LATIN SMALL LETTER G
U+0072 r (Ll): LATIN SMALL LETTER R
You might want to use "Níðhöggr", which looks the same when printed out, but contains U+00F6 LATIN SMALL LETTER O WITH DIAERESIS
instead of the two-character ö combo. In other words, it is in the composed normal form (NFC).
Or you might want "Níðhöggr", which has 10 Unicode characters (the í is split int i
and a combining accent). That would be decomposed normal form (NFD).
Google "Unicode normalization" for interesting and/or hairy details. Use this function to normalize Unicode data in Haskell (thanks, Adam Rosenfield!).
Because your ö
isn't the single character ö
(U+00F6 LATIN SMALL LETTER O WITH DIAERESIS); it's U+006F LATIN SMALL LETTER O plus U+0308 COMBINING DIAERESIS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With