Ignoring accented letters in string comparison

People also ask

How do you compare letters in a string?

In other words, strings are compared letter-by-letter. The algorithm to compare two strings is simple: Compare the first character of both strings. If the first character from the first string is greater (or less) than the other string's, then the first string is greater (or less) than the second.

What does string comparison mean?

string= compares two strings and is true if they are the same (corresponding characters are identical) but is false if they are not. The function equal calls string= if applied to two strings. The keyword arguments :start1 and :start2 are the places in the strings to start the comparison.

EDIT 2012-01-20: Oh boy! The solution was so much simpler and has been in the framework nearly forever. As pointed out by knightpfhor :

string.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);

Here's a function that strips diacritics from a string:

static string RemoveDiacritics(string text)
{
  string formD = text.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  foreach (char ch in formD)
  {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
    if (uc != UnicodeCategory.NonSpacingMark)
    {
      sb.Append(ch);
    }
  }

  return sb.ToString().Normalize(NormalizationForm.FormC);
}

More details on MichKap's blog (RIP...).

The principle is that is it turns 'é' into 2 successive chars 'e', acute. It then iterates through the chars and skips the diacritics.

"héllo" becomes "he<acute>llo", which in turn becomes "hello".

Debug.Assert("hello"==RemoveDiacritics("héllo"));

Note: Here's a more compact .NET4+ friendly version of the same function:

static string RemoveDiacritics(string text)
{
  return string.Concat( 
      text.Normalize(NormalizationForm.FormD)
      .Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch)!=
                                    UnicodeCategory.NonSpacingMark)
    ).Normalize(NormalizationForm.FormC);
}

If you don't need to convert the string and you just want to check for equality you can use

string s1 = "hello";
string s2 = "héllo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace) == 0)
{
    // both strings are equal
}

or if you want the comparison to be case insensitive as well

string s1 = "HEllO";
string s2 = "héLLo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0)
{
    // both strings are equal
}

I had to do something similar but with a StartsWith method. Here is a simple solution derived from @Serge - appTranslator.

Here is an extension method:

    public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
    {
        if (str.Length >= value.Length)
            return string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
        else
            return false;            
    }

And for one liners freaks ;)

    public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
    {
        return str.Length >= value.Length && string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
    }

Accent incensitive and case incensitive startsWith can be called like this

value.ToString().StartsWith(str, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase)

The following method CompareIgnoreAccents(...) works on your example data. Here is the article where I got my background information: http://www.codeproject.com/KB/cs/EncodingAccents.aspx

private static bool CompareIgnoreAccents(string s1, string s2)
{
    return string.Compare(
        RemoveAccents(s1), RemoveAccents(s2), StringComparison.InvariantCultureIgnoreCase) == 0;
}

private static string RemoveAccents(string s)
{
    Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");

    return destEncoding.GetString(
        Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}

I think an extension method would be better:

public static string RemoveAccents(this string s)
{
    Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");

    return destEncoding.GetString(
        Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}

Then the use would be this:

if(string.Compare(s1.RemoveAccents(), s2.RemoveAccents(), true) == 0) {
   ...

A more simple way to remove accents:

    Dim source As String = "áéíóúç"
    Dim result As String

    Dim bytes As Byte() = Encoding.GetEncoding("Cyrillic").GetBytes(source)
    result = Encoding.ASCII.GetString(bytes)

Related questions
                            
                                How to handle click event in Button Column in Datagridview?
                            
                                Call a stored procedure with parameter in c#
                            
                                URL Encode and Decode in ASP.NET Core
                            
                                LINQPad [extension] methods [closed]
                            
                                EntityType has no key defined error
                            
                                Execute unit tests serially (rather than in parallel)
                            
                                Why C# fails to compare two object types with each other but VB doesn't?
                            
                                Why is infinity printed as "8" in the Windows 10 console?
                            
                                Should a return statement be inside or outside a lock?
                            
                                Where to find "Microsoft.VisualStudio.TestTools.UnitTesting" missing dll?
                            
                                Handling the window closing event with WPF / MVVM Light Toolkit
                            
                                Show a Form without stealing focus?
                            
                                How can I get the active screen dimensions?
                            
                                How do you implement a private setter when using an interface?
                            
                                Thread-safe List<T> property
                            
                                How to shut down the computer from C#
                            
                                Stripping out non-numeric characters in string
                            
                                Convert any object to a byte[]
                            
                                How to programmatically click a button in WPF?
                            
                                Shortcut for creating single item list in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ignoring accented letters in string comparison

Tags:

string

c#

localization

People also ask

Recent Activity

Donate For Us