Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get true if we compare a to á?

Tags:

string

c#

I'm studying string.Normalize() method and I thought it is used to compare string equality if they are using different unicode.

Here's what I've done so far. Is the string.Equals() is not what I'm supposed to use here?

        string stra = "á";
        string straNorm = stra.Normalize();
        string strFormC = stra.Normalize(NormalizationForm.FormC);
        string strFormD = stra.Normalize(NormalizationForm.FormD);
        string strFormKC = stra.Normalize(NormalizationForm.FormKC);
        string strFormKD = stra.Normalize(NormalizationForm.FormKD);
        Console.WriteLine("norm {0}",straNorm);
        Console.WriteLine("C {0}", strFormC);
        Console.WriteLine("D {0}", strFormD);
        Console.WriteLine("KC {0}", strFormKC);
        Console.WriteLine("KD {0}", strFormKD);

        Console.WriteLine("a".Equals(stra)); //false
        Console.WriteLine("a".Equals(straNorm)); //false
        Console.WriteLine("a".Equals(stra.Normalize())); //false
        Console.WriteLine("a".Equals(strFormC)); //false
        Console.WriteLine("a".Equals(strFormKC)); //false
        Console.WriteLine("a".Equals(strFormKD)); //false
like image 206
Ronald Abellano Avatar asked Apr 06 '19 10:04

Ronald Abellano


People also ask

How do you compare characters in a string?

strcmp is used to compare two different C strings. When the strings passed to strcmp contains exactly same characters in every index and have exactly same length, it returns 0. For example, i will be 0 in the following code: char str1[] = "Look Here"; char str2[] = "Look Here"; int i = strcmp(str1, str2);

How to compare which string is greater?

Java String compareTo() Method The method returns 0 if the string is equal to the other string. A value less than 0 is returned if the string is less than the other string (less characters) and a value greater than 0 if the string is greater than the other string (more characters).

How do I check if a string is greater than another C#?

C# String Compare() The C# Compare() method is used to compare first string with second string lexicographically. It returns an integer value. If both strings are equal, it returns 0. If first string is greater than second string, it returns 1 else it returns -1.


Video Answer


2 Answers

You can use string.Compare() setting CultureInfo.InvariantCulture and CompareOptions.IgnoreNonSpace as you can see below I have created a method called CompareStrings(string str1, string str2), it will return a boolean

public bool CompareStrings(string str1, string str2)
{
    return string.Compare(str1, str2, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace) == 0; 
}

Calling the method to compare strings:

Console.WriteLine(CompareStrings("a", "á"));
Console.WriteLine(CompareStrings("a", "a"));
Console.WriteLine(CompareStrings("a", "b"));

Results:

True
True
False

The CompareOptions.IgnoreNonSpace definition: It "indicates that the string comparison must ignore nonspacing combining characters, such as diacritics. The Unicode Standard defines combining characters as characters that are combined with base characters to produce a new character. Nonspacing combining characters do not occupy a spacing position by themselves when rendered."

You can find out more about CompareOptions on docs

like image 149
Aderbal Farias Avatar answered Oct 12 '22 23:10

Aderbal Farias


After normalization in forms D and KD, the string will contain two characters: a letter and a diacritical character. It is necessary to make a comparison with the letter.

string stra = "á";

string strFormC = stra.Normalize(NormalizationForm.FormC);
string strFormD = stra.Normalize(NormalizationForm.FormD);
string strFormKC = stra.Normalize(NormalizationForm.FormKC);
string strFormKD = stra.Normalize(NormalizationForm.FormKD);

Console.WriteLine("C {0}", strFormC.Length); // 1
Console.WriteLine("D {0}", strFormD.Length); // 2
Console.WriteLine("KC {0}", strFormKC.Length); // 1
Console.WriteLine("KD {0}", strFormKD.Length); // 2

Console.WriteLine("a".Equals(strFormD[0].ToString())); // True
Console.WriteLine("a".Equals(strFormKD[0].ToString())); // True

We can remove all diacritical characters with a regular expression.

\p{M} - is Unicode category means All diacritic marks.

string stra = "á";

string strFormD = stra.Normalize(NormalizationForm.FormD);

var result = Regex.Replace(strFormD, @"\p{M}", string.Empty);

Console.WriteLine("a".Equals(result)); // True
Console.WriteLine("a" == result); // True
like image 31
Alexander Petrov Avatar answered Oct 13 '22 00:10

Alexander Petrov