I'm studying string.Normalize()
method and I thought it is used to compare string equality if they are using different unicode.
Here's what I've done so far. Is the string.Equals()
is not what I'm supposed to use here?
string stra = "á";
string straNorm = stra.Normalize();
string strFormC = stra.Normalize(NormalizationForm.FormC);
string strFormD = stra.Normalize(NormalizationForm.FormD);
string strFormKC = stra.Normalize(NormalizationForm.FormKC);
string strFormKD = stra.Normalize(NormalizationForm.FormKD);
Console.WriteLine("norm {0}",straNorm);
Console.WriteLine("C {0}", strFormC);
Console.WriteLine("D {0}", strFormD);
Console.WriteLine("KC {0}", strFormKC);
Console.WriteLine("KD {0}", strFormKD);
Console.WriteLine("a".Equals(stra)); //false
Console.WriteLine("a".Equals(straNorm)); //false
Console.WriteLine("a".Equals(stra.Normalize())); //false
Console.WriteLine("a".Equals(strFormC)); //false
Console.WriteLine("a".Equals(strFormKC)); //false
Console.WriteLine("a".Equals(strFormKD)); //false
strcmp is used to compare two different C strings. When the strings passed to strcmp contains exactly same characters in every index and have exactly same length, it returns 0. For example, i will be 0 in the following code: char str1[] = "Look Here"; char str2[] = "Look Here"; int i = strcmp(str1, str2);
Java String compareTo() Method The method returns 0 if the string is equal to the other string. A value less than 0 is returned if the string is less than the other string (less characters) and a value greater than 0 if the string is greater than the other string (more characters).
C# String Compare() The C# Compare() method is used to compare first string with second string lexicographically. It returns an integer value. If both strings are equal, it returns 0. If first string is greater than second string, it returns 1 else it returns -1.
You can use string.Compare()
setting CultureInfo.InvariantCulture
and CompareOptions.IgnoreNonSpace
as you can see below I have created a method called CompareStrings(string str1, string str2)
, it will return a boolean
public bool CompareStrings(string str1, string str2)
{
return string.Compare(str1, str2, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace) == 0;
}
Calling the method to compare strings:
Console.WriteLine(CompareStrings("a", "á"));
Console.WriteLine(CompareStrings("a", "a"));
Console.WriteLine(CompareStrings("a", "b"));
Results:
True
True
False
The CompareOptions.IgnoreNonSpace
definition: It "indicates that the string comparison must ignore nonspacing combining characters, such as diacritics. The Unicode Standard defines combining characters as characters that are combined with base characters to produce a new character. Nonspacing combining characters do not occupy a spacing position by themselves when rendered."
You can find out more about CompareOptions
on docs
After normalization in forms D and KD, the string will contain two characters: a letter and a diacritical character. It is necessary to make a comparison with the letter.
string stra = "á";
string strFormC = stra.Normalize(NormalizationForm.FormC);
string strFormD = stra.Normalize(NormalizationForm.FormD);
string strFormKC = stra.Normalize(NormalizationForm.FormKC);
string strFormKD = stra.Normalize(NormalizationForm.FormKD);
Console.WriteLine("C {0}", strFormC.Length); // 1
Console.WriteLine("D {0}", strFormD.Length); // 2
Console.WriteLine("KC {0}", strFormKC.Length); // 1
Console.WriteLine("KD {0}", strFormKD.Length); // 2
Console.WriteLine("a".Equals(strFormD[0].ToString())); // True
Console.WriteLine("a".Equals(strFormKD[0].ToString())); // True
We can remove all diacritical characters with a regular expression.
\p{M}
- is Unicode category means All diacritic marks.
string stra = "á";
string strFormD = stra.Normalize(NormalizationForm.FormD);
var result = Regex.Replace(strFormD, @"\p{M}", string.Empty);
Console.WriteLine("a".Equals(result)); // True
Console.WriteLine("a" == result); // True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With