Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a Unicode string to an escaped ASCII string

How can I convert this string:

This string contains the Unicode character Pi(π) 

into an escaped ASCII string:

This string contains the Unicode character Pi(\u03a0) 

and vice versa?

The current Encoding available in C# converts the π character to "?". I need to preserve that character.

like image 400
Ali Avatar asked Oct 23 '09 19:10

Ali


People also ask

How do I escape Unicode?

A unicode escape sequence is a backslash followed by the letter 'u' followed by four hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the four digits. For example, ”\u0041“ matches the target sequence ”A“ when the ASCII character encoding is used.

How do you escape Unicode characters in Java?

According to section 3.3 of the Java Language Specification (JLS) a unicode escape consists of a backslash character (\) followed by one or more 'u' characters and four hexadecimal digits. So for example \u000A will be treated as a line feed.

How do you escape a Unicode character in Python?

In Python source code, Unicode literals are written as strings prefixed with the 'u' or 'U' character: u'abcdefghijk' . Specific code points can be written using the \u escape sequence, which is followed by four hex digits giving the code point. The \U escape sequence is similar, but expects 8 hex digits, not 4.

What do you mean by Unicode?

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.


1 Answers

This goes back and forth to and from the \uXXXX format.

class Program {     static void Main( string[] args ) {         string unicodeString = "This function contains a unicode character pi (\u03a0)";          Console.WriteLine( unicodeString );          string encoded = EncodeNonAsciiCharacters(unicodeString);         Console.WriteLine( encoded );          string decoded = DecodeEncodedNonAsciiCharacters( encoded );         Console.WriteLine( decoded );     }      static string EncodeNonAsciiCharacters( string value ) {         StringBuilder sb = new StringBuilder();         foreach( char c in value ) {             if( c > 127 ) {                 // This character is too big for ASCII                 string encodedValue = "\\u" + ((int) c).ToString( "x4" );                 sb.Append( encodedValue );             }             else {                 sb.Append( c );             }         }         return sb.ToString();     }      static string DecodeEncodedNonAsciiCharacters( string value ) {         return Regex.Replace(             value,             @"\\u(?<Value>[a-zA-Z0-9]{4})",             m => {                 return ((char) int.Parse( m.Groups["Value"].Value, NumberStyles.HexNumber )).ToString();             } );     } } 

Outputs:

This function contains a unicode character pi (π)

This function contains a unicode character pi (\u03a0)

This function contains a unicode character pi (π)

like image 179
Adam Sills Avatar answered Oct 13 '22 05:10

Adam Sills