determine text code type and cast to default

Tags:

I have an input string in alien coding system, i.e.: "\\U+1043\\U+1072\\U+1073\\U+1072\\U+1088\\U+1080\\U+1090\\U+1085\\U+1086\\U+1089\\U+1090\\U+1100"

And I want to cast it to my default code system(System.Text.Encoding.Default):

Click to copy

-       System.Text.Encoding.Default    {System.Text.SBCSCodePageEncoding}  System.Text.Encoding {System.Text.SBCSCodePageEncoding}
+       [System.Text.SBCSCodePageEncoding]  {System.Text.SBCSCodePageEncoding}  System.Text.SBCSCodePageEncoding
        BodyName    "koi8-r"    string
        CodePage    1251    int
+       DecoderFallback {System.Text.InternalDecoderBestFitFallback}    System.Text.DecoderFallback {System.Text.InternalDecoderBestFitFallback}
+       EncoderFallback {System.Text.InternalEncoderBestFitFallback}    System.Text.EncoderFallback {System.Text.InternalEncoderBestFitFallback}
        EncodingName    "Cyrillic (Windows)"    string
        HeaderName  "windows-1251"  string
        IsBrowserDisplay    true    bool
        IsBrowserSave   true    bool
        IsMailNewsDisplay   true    bool
        IsMailNewsSave  true    bool
        IsReadOnly  true    bool
        IsSingleByte    true    bool
        WebName "windows-1251"  string
        WindowsCodePage 1251    int

How I could determine code system and how to cast it?

395

asked Nov 29 '12 10:11

RomanKovalev

1 Answers

I'm not sure if I really understand your question.

In .NET, when you have a string object then you don't need to care about different encodings. All .NET strings use the same encoding: Unicode (or more precisely: UTF-16).

Different text encodings only come into play, when you turn a string object into a byte sequence (e.g. to write it to a text file) or vice versa. I assume you are talking about this. To convert a byte sequence from one encoding to another, you could write:

Click to copy

byte[] input = ReadInput(); // e.g. from a file
Encoding decoder = Encoding.GetEncoding("encoding of input");
string str = decoder.GetString(input);
Encoding encoder = Encoding.GetEncoding("encoding of output");
byte[] ouput = encoder.GetBytes(str);

Of course you need to replace encoding of input and encoding of output with proper encoding names. MSDN has a list of all supported encodings.

You need to know the encoding of the input, either by convention or based on metadata or something. You cannot reliably determine/guess an unknown encoding, but there are some tricks and heuristics you could apply. See How can I detect the encoding/codepage of a text file.

Edit:

"U+xxxx" is how you usually refer to a specific Unicode code point (the number assigned to a Unicode character), e.g. the code point of the letter "A" (Latin capital A) is U+0041.

Is your input string actually "\\U+1043..." (backslash, backslash, capital U etc.) or is it only displayed like this e.g. in a debugger window? If it's the first then somebody made a mistake while encoding the text, maybe by trying to write a Unicode literal and accidentaly escaping the backslash by writing a second one (Edit2: Or the characters were deliberately saved in an escaped way to write them into an ASCII-encoded file/stream/etc). As far as I know, the .NET encoding classes do not help you here; you need to parse the string by hand.

By the way, the numbers in your example are strange. In the standard notation, the number after "U+" is a hex number, not a decimal number. But if you read the code points as hex numbers then they refer to characters from completely unrelated script systems (Burmese, Georgian Mkhedruli, Hangul Jamo); read as decimal numbers they all refer to Cyrillic letters, though.

Edit3: To parse it, well, look for substrings in the form \\U+xxxx (with x being a digit), convert xxxx to an int n, create a char with that code point (Char.ConvertFromUtf32(n)) and replace the whole substring by that char.

answered Oct 18 '22 19:10

Sebastian Negraszus

Related questions
                            
                                Windows Phone Live SDK API - get new session object after restarting the App
                            
                                Disable clicking or typing in a WebBrowser Control
                            
                                Instantiate an object in razor view Asp.net Mvc 4
                            
                                C# equivalent to "Not MyEnum.SomeValue"
                            
                                How can I change the Icon of WPF application [duplicate]
                            
                                Attached Property + Style -> ArgumentNullException
                            
                                What does "System.IO" mean in C#? [closed]
                            
                                Error: A namespace cannot directly contain members such as fields or methods [duplicate]
                            
                                C# equivalent of perl's $_
                            
                                Why ("abc"+char.MaxValue).CompareTo("abc")==0?
                            
                                How to access the app.config value in the class library?
                            
                                RoutedEvent Tunnel does not reach child
                            
                                Line breaks not counted as 2 characters towards the length of a string
                            
                                Combining Func<bool> expressions [duplicate]
                            
                                MVC dynamic routeValues for ActionLinks
                            
                                Where and how do i call different CSS for different browsers in my MVC website?
                            
                                MVC Routing get rid of /Index in URL
                            
                                Has my application been started by hand?
                            
                                How to get list of OU name in AD using DomainName using c#?
                            
                                C# + operator calls string.concat function? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

determine text code type and cast to default

Tags:

c#

.net

f#

RomanKovalev

People also ask

1 Answers

Sebastian Negraszus

Recent Activity

Donate For Us