I have four options on Dreamweaver: C, D, KC, KD. Which one should I choose and why?
Essentially, the Unicode Normalization Algorithm puts all combining marks in a specified order, and uses rules for decomposition and composition to transform each string into one of the Unicode Normalization Forms. A binary comparison of the transformed strings will then determine equivalence.
Unicode normalization is our solution to both canonical and compatibility equivalence issues. In normalization, there are two directions and two types of conversions we can make. The two types we have already covered, canonical and compatibility.
NFD. Normalization Form Canonical Decomposition. Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.
normalize (form, unistr) Return the normal form form for the Unicode string unistr. Valid values for form are 'NFC', 'NFKC', 'NFD', and 'NFKD'. The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence.
For what? Saving a file, use NFC as the web character model uses it (strictly, the W3C normalisation insists that both the stream be in NFC and also that when entities in HTML or XML are converted to the characters they represent, that it is still in NFC). The odds that it'll ever make a practical difference are slim, though it could stop a few rather obscure issues upsetting someone down the line.
Normalisation makes certain equivalent sequences result in identical streams. For example, U+0065 (e) followed by U+0301 (a combining acute accent) is equivalent to U+00E9 (é) on its own.
NFD splits all such strings up into their component parts (e.g. turning U+00E9 into U+0065 followed by U+0301). If there are two or more combining characters in a row, they are re-ordered according to rules that give a consistency (ḉ could have the cedilla followed by the accute or the accute followed by the cedilla, and we need a consistent ordering to have the same string produced). Mostly NFD is useful for internal processing as part of another task, such as stripping accents, or producing NFC.
NFC starts with NFD and then combines the characters together again where possible, barring a few exceptions to ensure that what was a normalised string with one version of Unicode remains so with another.
NFKD goes further than NFD in replacing certain similar characters with each other. ⁵ for example is replaced with 5. This "damages" the text (a user may reasonably choose ⁵ over 5 for a good reason) but is useful for searching (search for "fiſh" on google and it returns results for "fish" because it treats the long-s the same as a short-s) and as a restriction in certain cases to avoid security issues with similar but different characters. NKFC first does NFKD and then combines in the same manner as NFC.
http://unicode.org/reports/tr15/ for the full skinny, and "use NFC but don't worry about it" to repeat the short answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With