Why is length of "Níðhöggr" 9?

Question

Why is the length function saying that this 8 character string is 9 characters?

>>> length "Níðhöggr"
9

Petr Viktorin · Accepted Answer

"Níðhöggr" contains 9 Unicode characters:

U+004E N (Lu): LATIN CAPITAL LETTER N 
U+00ED í (Ll): LATIN SMALL LETTER I WITH ACUTE
U+00F0 ð (Ll): LATIN SMALL LETTER ETH 
U+0068 h (Ll): LATIN SMALL LETTER H 
U+006F o (Ll): LATIN SMALL LETTER O 
U+0308 ̈ (Mn): COMBINING DIAERESIS 
U+0067 g (Ll): LATIN SMALL LETTER G 
U+0067 g (Ll): LATIN SMALL LETTER G 
U+0072 r (Ll): LATIN SMALL LETTER R

You might want to use "Níðhöggr", which looks the same when printed out, but contains U+00F6 LATIN SMALL LETTER O WITH DIAERESIS instead of the two-character ö combo. In other words, it is in the composed normal form (NFC).

Or you might want "Níðhöggr", which has 10 Unicode characters (the í is split int i and a combining accent). That would be decomposed normal form (NFD).

Google "Unicode normalization" for interesting and/or hairy details. Use this function to normalize Unicode data in Haskell (thanks, Adam Rosenfield!).

Cairnarvon · Answer

Because your ö isn't the single character ö (U+00F6 LATIN SMALL LETTER O WITH DIAERESIS); it's U+006F LATIN SMALL LETTER O plus U+0308 COMBINING DIAERESIS.

Why is length of "Níðhöggr" 9?

Tags:

haskell

unicode

Dog

2 Answers

Petr Viktorin

Cairnarvon

Recent Activity

Donate For Us

Why is length of "Níðhöggr" 9?

Tags:

haskell

unicode

Dog

2 Answers

Petr Viktorin

Cairnarvon

Related questions

Recent Activity

Donate For Us