Am wondering how to normalize strings (containing utf-8/utf-16) in C/C++. In .NET there is a function String.Normalize .
I used UTF8-CPP in the past but it does not provide such a function. ICU and Qt provide string normalization but I prefer lightweight solutions.
Is there any "lightweight" solution for this?
Unicode normalization converts the different representations to the same form so they can be compared. All conforming processors must support the NFC format. They are also free to support any or all of the other formats defined by Unicode, and they can support their own formats if they want.
Normalization Form Canonical Composition. Characters are decomposed and then recomposed by canonical equivalence. NFKD. Normalization Form Compatibility Decomposition. Characters are decomposed by compatibility, and multiple combining characters are arranged in a specific order.
The string. normalize() is an inbuilt method in javascript which is used to return a Unicode normalisation form of a given input string. If the given input is not a string, then at first it will be converted into a string then this method will work.
Unicode normalization is our solution to both canonical and compatibility equivalence issues. In normalization, there are two directions and two types of conversions we can make. The two types we have already covered, canonical and compatibility.
As I wrote in another question, utf8proc is a very nice, lightweight, library for basic Unicode functionality, including Unicode string normalization.
For Windows, there is the NormalizeString()
function (unfortunately for Vista and later only - as far as I see on MSDN):
http://msdn.microsoft.com/en-us/library/windows/desktop/dd319093%28v=vs.85%29.aspx
It's the simplest way to go that I have found so far. I guess it's quite lightweight too.
int NormalizeString(
_In_ NORM_FORM NormForm,
_In_ LPCWSTR lpSrcString,
_In_ int cwSrcLength,
_Out_opt_ LPWSTR lpDstString,
_In_ int cwDstLength
);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With