Normalization needed after case folding

Question

Given a NFC normalized string, applying full case folding to that string, can I assume that the result is NFC normalized too?

I don't understand what the Unicode standard is trying to tell me in this quote:

Normalization also interacts with case folding. For any string X, let Q(X) = NFC(toCasefold(NFD(X))). In other words, Q(X) is the result of normalizing X, then case folding the result, then putting the result into Normalization Form NFC format. Because of the way normalization and case folding are defined, Q(Q(X)) = Q(X). Repeatedly applying Q does not change the result; case folding is closed under canonical normalization for either Normalization Form NFC or NFD.

nwellnhof · Accepted Answer

A Unicode string might not be in NFC after case folding. An example is U+00DF (LATIN SMALL LETTER SHARP S) followed by U+0301 (COMBINING ACUTE ACCENT).

X = U+00DF U+0301
NFC(X) = U+00DF U+0301
toCasefold(NFC(X)) = U+0073 U+0073 U+0301
NFC(toCasefold(NFC(X))) = U+0073 U+015B

Anthony Faull · Answer

You have asked two questions:

Question 1: Is toCasefold(NFC(X)) binary equal to NFC(toCasefold(NFC(X)))?

The standard doesn't explicitly answer this question. (I would expect the answer is yes, that case folding does not affect normalization, but I have no proof.)

Question 2: What is the Unicode standard telling me in the quote?

The standard is only saying it is not necessary to do case folding again after canonical normalization. In other words, canonical normalization (to NFC or NFD form) does not change the case of any characters from uppercase to lowercase or vice versa. This doesn't answer your first question.

It is not saying whether or not it is necessary to do canonical normalization again after case folding.

Normalization needed after case folding

Tags:

unicode

normalization

unicode-normalization

case-folding

dalle

2 Answers

nwellnhof

Anthony Faull

Recent Activity

Donate For Us

Normalization needed after case folding

Tags:

unicode

normalization

unicode-normalization

case-folding

dalle

2 Answers

nwellnhof

Anthony Faull

Related questions

Recent Activity

Donate For Us