How can I remove accents on a string? [duplicate]

Tags:

.net

Possible Duplicate:
How do I remove diacritics (accents) from a string in .NET?

I have the following string

áéíóú

which I need to convert it to

aeiou

How can I achieve it? (I don't need to compare, I need the new string to save)

Not a duplicate of How do I remove diacritics (accents) from a string in .NET?. The accepted answer there doesn't explain anything and that's why I've "reopened" it.

613

asked Sep 22 '10 13:09

2 Answers

It depends on requirements. For most uses, then normalising to NFD and then filtering out all combining chars will do. For some cases, normalising to NFKD is more appropriate (if you also want to removed some further distinctions between characters).

Some other distinctions will not be caught by this, notably stroked Latin characters. There's also no clear non-locale-specific way for some (should ł be considered equivalent to l or w?) so you may need to customise beyond this.

There are also some cases where NFD and NFKD don't work quite as expected, to allow for consistency between Unicode versions.

Hence:

public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm, Func<char, char> customFolding)
{
    foreach(char c in src.Normalize(compatNorm ? NormalizationForm.FormKD : NormalizationForm.FormD))
    switch(CharUnicodeInfo.GetUnicodeCategory(c))
    {
      case UnicodeCategory.NonSpacingMark:
      case UnicodeCategory.SpacingCombiningMark:
      case UnicodeCategory.EnclosingMark:
        //do nothing
        break;
      default:
        yield return customFolding(c);
        break;
    }
}
public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm)
{
  return RemoveDiacritics(src, compatNorm, c => c);
}
public static string RemoveDiacritics(string src, bool compatNorm, Func<char, char> customFolding)
{
  StringBuilder sb = new StringBuilder();
  foreach(char c in RemoveDiacriticsEnum(src, compatNorm, customFolding))
    sb.Append(c);
  return sb.ToString();
}
public static string RemoveDiacritics(string src, bool compatNorm)
{
  return RemoveDiacritics(src, compatNorm, c => c);
}

Here we've a default for the problem cases mentioned above, which just ignores them. We've also split building a string from generating the enumeration of characters so we need not be wasteful in cases where there's no need for string manipulation on the result (say we were going to write the chars to output next, or do some further char-by-char manipulation).

An example case for something where we wanted to also convert ł and Ł to l and L, but had no other specialised concerns could use:

private static char NormaliseLWithStroke(char c)
{
  switch(c)
  {
     case 'ł':
       return 'l';
     case 'Ł':
       return 'L';
     default:
       return c;
  }
}

Using this with the above methods will combine to remove the stroke in this case, along with the decomposable diacritics.

187

answered Oct 19 '22 20:10

Jon Hanna

public string RemoveDiacritics(string input)
{
    string stFormD = input.Normalize(NormalizationForm.FormD);
    int len = stFormD.Length;
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < len; i++)
    {
        System.Globalization.UnicodeCategory uc = System.Globalization.CharUnicodeInfo.GetUnicodeCategory(stFormD[i]);
        if (uc != System.Globalization.UnicodeCategory.NonSpacingMark)
        {
            sb.Append(stFormD[i]);
        }
    }
    return (sb.ToString().Normalize(NormalizationForm.FormC));
}

answered Oct 19 '22 20:10

cichy

Related questions
                            
                                C# immutable int
                            
                                XML string deserialization into c# object
                            
                                ConfigurationSettings.AppSettings is obsolete [duplicate]
                            
                                How to download memorystream to a file?
                            
                                c# - How to convert Timestamp to Date?
                            
                                Split string (path of Uri) based on "/"
                            
                                Winforms DotNet ListBox items to word wrap if content string width is bigger than ListBox width?
                            
                                how many times is System.Web.HttpApplication is initialised per process
                            
                                How to protect resources that may be used in a multi-threaded or async environment?
                            
                                How to make full screen mode, without covering the taskbar using :wpf c#
                            
                                Visual Studio 2017 Unexpected Character '
                            
                                Is there a data annotation for unique constraint in EF Core (code first)?
                            
                                How to redirect with "www" URL's to without "www" URL's or vice-versa?
                            
                                Chrome Style C# Applications?
                            
                                Is it possible to sort a HashTable?
                            
                                Convert "1.79769313486232E+308" to double without OverflowException?
                            
                                How can I execute a .sql from C#?
                            
                                Exit Try/Catch to prevent code after from being run
                            
                                How to make a SOAP/WSDL client in C#?
                            
                                .Net 4.0 System.Web.Security.MembershipProvider ambiguous reference?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With