I'm looking for pseudocode, or sample code, to convert higher bit ascii characters (like, Ü which is extended ascii 154) into U (which is ascii 85). My initial guess is that since there are only about 25 ascii characters that are similar to 7bit ascii characters, a translation array would have to be used. Let me know if you can think of anything else.

For .NET users the article in CodeProject (thanks to GvS's tip) does indeed answer the question more correctly than any other I've seen so far. However the code in that article (in solution #1) is cumbersome. Here's a compact version: <pre class="prettyprint"><code>// Based on http://www.codeproject.com/Articles/13503/Stripping-Accents-from-Latin-Characters-A-Foray-in private static string LatinToAscii(string inString) { var newStringBuilder = new StringBuilder(); newStringBuilder.Append(inString.Normalize(NormalizationForm.FormKD) .Where(x => x < 128) .ToArray()); return newStringBuilder.ToString(); } </code></pre> <hr> To expand a bit on the answer, this method uses String.Normalize which: <blockquote> Returns a new string whose textual value is the same as this string, but whose binary representation is in the specified Unicode normalization form. </blockquote> Specifically in this case we use the NormalizationForm <code>FormKD</code>, described in those same MSDN docs as such: <blockquote> FormKD - Indicates that a Unicode string is normalized using full compatibility decomposition. </blockquote> For more information about unicode normalization forms, see Unicode Annex #15.

How do I translate 8bit characters into 7bit characters? (i.e. Ü to U)

2 Answers

For .NET users the article in CodeProject (thanks to GvS's tip) does indeed answer the question more correctly than any other I've seen so far.

However the code in that article (in solution #1) is cumbersome. Here's a compact version:

// Based on http://www.codeproject.com/Articles/13503/Stripping-Accents-from-Latin-Characters-A-Foray-in private static string LatinToAscii(string inString) {     var newStringBuilder = new StringBuilder();     newStringBuilder.Append(inString.Normalize(NormalizationForm.FormKD)                                     .Where(x => x < 128)                                     .ToArray());     return newStringBuilder.ToString(); }

To expand a bit on the answer, this method uses String.Normalize which:

Returns a new string whose textual value is the same as this string, but whose binary representation is in the specified Unicode normalization form.

Specifically in this case we use the NormalizationForm FormKD, described in those same MSDN docs as such:

FormKD - Indicates that a Unicode string is normalized using full compatibility decomposition.

For more information about unicode normalization forms, see Unicode Annex #15.

177

answered Oct 05 '22 04:10

sinelaw

Most languages have a standard way to replace accented characters with standard ASCII, but it depends on the language, and it often involves replacing a single accented character with two ASCII ones. e.g. in German ü becomes ue. So if you want to handle natural languages properly it's a lot more complicated than you think it is.

answered Oct 05 '22 04:10

Mark Baker

Related questions
                            
                                How is SharePoint perceived in your company? [closed]
                            
                                Reading and displaying data from a .txt file
                            
                                Will a disabled text field submit when a form is POSTed?
                            
                                How to know which Linux Distribution I'm using? [closed]
                            
                                Java NullPointerException when adding to ArrayList?
                            
                                c# how do you return dataset from sqldatareader?
                            
                                jQuery or JavaScript auto click
                            
                                How to pause in C?
                            
                                Reporting incorrect bounds in landscape Mode
                            
                                How to tint a Bitmap to a solid color
                            
                                div background color in print page doesn't work
                            
                                In matplotlib, how do you display an axis on both sides of the figure?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I translate 8bit characters into 7bit characters? (i.e. Ü to U)

Tags:

Michael Pryor

People also ask

2 Answers

sinelaw

Mark Baker

Recent Activity

Donate For Us