How do I remove diacritics (accents) from a string in .NET?

Tags:

I'm trying to convert some strings that are in French Canadian and basically, I'd like to be able to take out the French accent marks in the letters while keeping the letter. (E.g. convert é to e, so crème brûlée would become creme brulee)

What is the best method for achieving this?

326

asked Oct 30 '08 02:10

James Hall

2 Answers

I've not used this method, but Michael Kaplan describes a method for doing so in his blog post (with a confusing title) that talks about stripping diacritics: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)

static string RemoveDiacritics(string text)  {     var normalizedString = text.Normalize(NormalizationForm.FormD);     var stringBuilder = new StringBuilder(capacity: normalizedString.Length);      for (int i = 0; i < normalizedString.Length; i++)     {         char c = normalizedString[i];         var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);         if (unicodeCategory != UnicodeCategory.NonSpacingMark)         {             stringBuilder.Append(c);         }     }      return stringBuilder         .ToString()         .Normalize(NormalizationForm.FormC); }

Note that this is a followup to his earlier post: Stripping diacritics....

The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the "base" characters from the diacritics) and then scans the result and retains only the base characters. It's just a little complicated, but really you're looking at a complicated problem.

Of course, if you're limiting yourself to French, you could probably get away with the simple table-based approach in How to remove accents and tilde in a C++ std::string, as recommended by @David Dibben.

answered Oct 06 '22 11:10

Blair Conrad

this did the trick for me...

string accentedStr; byte[] tempBytes; tempBytes = System.Text.Encoding.GetEncoding("ISO-8859-8").GetBytes(accentedStr); string asciiStr = System.Text.Encoding.UTF8.GetString(tempBytes);

quick&short!

answered Oct 06 '22 12:10

azrafe7

Related questions
                            
                                What is the difference between IQueryable<T> and IEnumerable<T>?
                            
                                How is Math.Pow() implemented in .NET Framework?
                            
                                Better way to check if a Path is a File or a Directory?
                            
                                Passing arguments to C# generic new() of templated type
                            
                                Where Is Machine.Config?
                            
                                Why does Math.Round(2.5) return 2 instead of 3?
                            
                                A generic list of anonymous class
                            
                                .NET / C# - Convert char[] to string
                            
                                How do I truncate a .NET string?
                            
                                If strings are immutable in .NET, then why does Substring take O(n) time?
                            
                                Replacing .NET WebBrowser control with a better browser, like Chrome? [closed]
                            
                                ArrayList vs List<> in C#
                            
                                What is the syntax for an inner join in LINQ to SQL?
                            
                                Difference between Math.Floor() and Math.Truncate()
                            
                                Asynchronously wait for Task<T> to complete with timeout
                            
                                Deserialize JSON object into dynamic object using Json.net
                            
                                What is the difference between Nullable<T>.HasValue or Nullable<T> != null?
                            
                                Using async/await for multiple tasks
                            
                                HashSet vs. List performance
                            
                                What are your favorite extension methods for C#? (codeplex.com/extensionoverflow)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I remove diacritics (accents) from a string in .NET?

Tags:

string

.net

diacritics

James Hall

People also ask

2 Answers

Blair Conrad

azrafe7

Recent Activity

Donate For Us