Say you've loaded a text file into a string, and you'd like to convert all Unicode escapes into actual Unicode characters inside of the string. Example: <blockquote> "The following is the top half of an integral character in Unicode '\u2320', and this is the lower half '\U2321'." </blockquote>

The answer is simple and works well with strings up to at least several thousand characters. Example 1: <pre class="prettyprint"><code>Regex rx = new Regex( @"\\[uU]([0-9A-F]{4})" ); result = rx.Replace( result, match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString() ); </code></pre> Example 2: <pre class="prettyprint"><code>Regex rx = new Regex( @"\\[uU]([0-9A-F]{4})" ); result = rx.Replace( result, delegate (Match match) { return ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } ); </code></pre> The first example shows the replacement being made using a lambda expression (C# 3.0) and the second uses a delegate which should work with C# 2.0. To break down what's going on here, first we create a regular expression: <pre class="prettyprint"><code>new Regex( @"\\[uU]([0-9A-F]{4})" ); </code></pre> Then we call Replace() with the string 'result' and an anonymous method (lambda expression in the first example and the delegate in the second - the delegate could also be a regular method) that converts each regular expression that is found in the string. The Unicode escape is processed like this: <pre class="prettyprint"><code>((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); }); </code></pre> Get the string representing the number part of the escape (skip the first two characters). <pre class="prettyprint"><code>match.Value.Substring(2) </code></pre> Parse that string using Int32.Parse() which takes the string and the number format that the Parse() function should expect which in this case is a hex number. <pre class="prettyprint"><code>NumberStyles.HexNumber </code></pre> Then we cast the resulting number to a Unicode character: <pre class="prettyprint"><code>(char) </code></pre> And finally we call ToString() on the Unicode character which gives us its string representation which is the value passed back to Replace(): <pre class="prettyprint"><code>.ToString() </code></pre> Note: Instead of grabbing the text to be converted with a Substring call you could use the match parameter's GroupCollection, and a subexpressions in the regular expression to capture just the number ('2320'), but that's more complicated and less readable.

How do I convert Unicode escape sequences to Unicode characters in a .NET string?

1 Answers

The answer is simple and works well with strings up to at least several thousand characters.

Example 1:

Regex  rx = new Regex( @"\\[uU]([0-9A-F]{4})" ); result = rx.Replace( result, match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString() );

Example 2:

Regex  rx = new Regex( @"\\[uU]([0-9A-F]{4})" ); result = rx.Replace( result, delegate (Match match) { return ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } );

The first example shows the replacement being made using a lambda expression (C# 3.0) and the second uses a delegate which should work with C# 2.0.

To break down what's going on here, first we create a regular expression:

new Regex( @"\\[uU]([0-9A-F]{4})" );

Then we call Replace() with the string 'result' and an anonymous method (lambda expression in the first example and the delegate in the second - the delegate could also be a regular method) that converts each regular expression that is found in the string.

The Unicode escape is processed like this:

((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); });

Get the string representing the number part of the escape (skip the first two characters).

match.Value.Substring(2)

Parse that string using Int32.Parse() which takes the string and the number format that the Parse() function should expect which in this case is a hex number.

NumberStyles.HexNumber

Then we cast the resulting number to a Unicode character:

(char)

And finally we call ToString() on the Unicode character which gives us its string representation which is the value passed back to Replace():

.ToString()

Note: Instead of grabbing the text to be converted with a Substring call you could use the match parameter's GroupCollection, and a subexpressions in the regular expression to capture just the number ('2320'), but that's more complicated and less readable.

141

answered Sep 19 '22 20:09

jr.

Related questions
                            
                                Is .( ever legal in C# or VB.Net?
                            
                                Timer in Portable Library
                            
                                Can I program in C# on a Mac?
                            
                                How to combine Find() and AsNoTracking()?
                            
                                Parsing Performance (If, TryParse, Try-Catch)
                            
                                Programmatic way to get all the available languages (in satellite assemblies)
                            
                                How do I get a DataRow from a row in a DataGridView
                            
                                How to programmatically get session cookie name?
                            
                                How to disable undesirable auto-complete with Visual Studio + ReSharper?
                            
                                ConfigurationElementCollection and Linq
                            
                                How do I get the Controller and Action names from the Referrer Uri?
                            
                                How can I use Debug.Write with dynamic data?
                            
                                MemoryStream seems be closed after NPOI workbook.write?
                            
                                What benefit does the new "Exception filter" feature provide?
                            
                                Swashbuckle 5 can't find my ApiControllers
                            
                                JsonSerializerSettings and Asp.Net Core
                            
                                DateTime Comparison Precision
                            
                                C# WPF Combobox select first item
                            
                                An error occurred while trying to restore packages. Please try again
                            
                                How to get a list of all routes in ASP.NET Core?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I convert Unicode escape sequences to Unicode characters in a .NET string?

Tags:

c#

.net

unicode

jr.

People also ask

1 Answers

jr.

Recent Activity

Donate For Us