Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

It sould be so obvious, but why does this fail?

Tags:

c#

.net

Been coding .net for years now yet I feel like a n00b. Why is the following code failing?

byte[] a = Guid.NewGuid().ToByteArray(); // 16 bytes in array
string b = new UTF8Encoding().GetString(a);
byte[] c = new UTF8Encoding().GetBytes(b);
Guid d = new Guid(c);    // Throws exception (32 bytes recived from c)

Update

Approved the answer from CodeInChaos. Reason for the 16 bytes becomming 32 bytes can be read in his answer. Also stated in the answer:

the default constructor of UTF8Encoding has error checking disabled

IMHO the UTF8 encoder should throw exception when trying to encode a byte array to string that contains invalid bytes. To make the .net framework behave properly the code should have been written as follows

 byte[] a = Guid.NewGuid().ToByteArray();
 string b = new UTF8Encoding(false, true).GetString(a);  // Throws exception as expected
 byte[] c = new UTF8Encoding(false, true).GetBytes(b);
 Guid d = new Guid(c);
like image 902
Tim Skauge Avatar asked Jan 22 '11 18:01

Tim Skauge


1 Answers

Not every sequence of bytes is a valid UTF-8 encoded string.

The GUID can contain almost any sequence of bytes. But UTF-8 as specific rules for which byte sequences are allowed if the value is >127. And a Guid will quite often not follow these rules.

Then when you encode the corrupted string back to a byte array you get a byte array longer than 16 bytes, which the constructor of Guid doesn't accept.


The documentation on UTF8Encoding.GetString states:

With error detection, an invalid sequence causes this method to throw a ArgumentException. Without error detection, invalid sequences are ignored, and no exception is thrown.

and the default constructor of UTF8Encoding has error checking disabled(don't ask me why).

This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.
Note
For security reasons, your applications are recommended to enable error detection by using the constructor that accepts a throwOnInvalidBytes parameter and setting that parameter to true.


You might want to use Base64 encoding instead of UTF-8. That way you can map any valid byte sequence into a string and back.

like image 169
CodesInChaos Avatar answered Sep 28 '22 07:09

CodesInChaos