What is the reason that Encoding.UTF8.GetString and Encoding.UTF8.GetBytes are not inverse of each other?

Tags:

Probably I am missing something, but I do not understand why Encoding.UTF8.GetString and Encoding.UTF8.GetBytes are not working as inverse transformation of each other?

In the following example the myOriginalBytes and asBytes are not equal, even their length is different. Could anyone explain what am I missing?

byte[] myOriginalBytes = GetRandomByteArray();
var asString = Encoding.UTF8.GetString(myOriginalBytes);
var asBytes = Encoding.UTF8.GetBytes(asString);

967

asked Jul 31 '17 07:07

g.pickardou

1 Answers

They're inverses if you start with a valid UTF-8 byte sequence, but they're not if you just start with an arbitrary byte sequence.

Let's take a concrete and very simple example: a single byte, 0xff. That's not the valid UTF-8 encoding for any text. So if you have:

byte[] bytes = { 0xff };
string text = Encoding.UTF8.GetString(bytes);

... you'll end up with text being a single character, U+FFFD, the "Unicode replacement character" which is used to indicate that there was an error decoding the binary data. You'll end up with that replacement character for any invalid sequence - so you'd get the same text if you started with 0x80 for example. Clearly if multiple binary inputs are decoded to the same textual output, it can't possibly be a fully-reversible transform.

If you have arbitrary binary data, you should not use Encoding to get text from it - you should use Convert.ToBase64String or maybe hex. Encoding is for data that is naturally textual.

If you go in the opposite direction, like this:

string text = GetRandomText();
byte[] bytes = Encoding.UTF8.GetBytes(text);
string text2 = Encoding.UTF8.GetString(bytes);

... I'd expect text2 to be equal to text with the exception of odd situations where you've got invalid text to start with, e.g. with "half" a surrogate pair.

answered Oct 07 '22 16:10

Jon Skeet

Related questions
                            
                                Dapper Call stored procedure and map result to class
                            
                                Is using the 'ref' keyword for string parameters in methods good for performance in C#? [duplicate]
                            
                                How do you mail merge a word document in c#
                            
                                Keyboard shortcut for Visual c# block comment in Visual Studio 2015?
                            
                                Json.NET - prevent re-serializing an already-serialized property [duplicate]
                            
                                MVC 6 Tag Helpers Intellisense?
                            
                                How to disable autofilter in closedXml c#?
                            
                                Are integer numbers generated with AutoFixture 3 unique?
                            
                                nunit tests throwing exception only when run as part of tfs msbuild process
                            
                                Mongo throwing "Element name 'name' is not valid' exception
                            
                                How can I ignore https certificate warnings in the c# signalr client?
                            
                                Why is there no Monitor.EnterAsync-like method
                            
                                Converting JObject to a dynamic object
                            
                                Best Practice for Use HttpClient
                            
                                Why does this method return double.PositiveInfinity not DivideByZeroException?
                            
                                How to access dbcontext & session in Custom Policy-Based Authorization
                            
                                How does JSON deserialization in C# work
                            
                                EF core many to many configuration not working with Fluent API
                            
                                Initiate a message from bot to user on BotFramework
                            
                                How to null coalesce for Boolean condition?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the reason that Encoding.UTF8.GetString and Encoding.UTF8.GetBytes are not inverse of each other?

Tags:

c#

.net

utf-8

g.pickardou

People also ask

1 Answers

Jon Skeet

Recent Activity

Donate For Us