Been coding .net for years now yet I feel like a n00b. Why is the following code failing?
byte[] a = Guid.NewGuid().ToByteArray(); // 16 bytes in array
string b = new UTF8Encoding().GetString(a);
byte[] c = new UTF8Encoding().GetBytes(b);
Guid d = new Guid(c); // Throws exception (32 bytes recived from c)
Update
Approved the answer from CodeInChaos. Reason for the 16 bytes becomming 32 bytes can be read in his answer. Also stated in the answer:
the default constructor of UTF8Encoding has error checking disabled
IMHO the UTF8 encoder should throw exception when trying to encode a byte array to string that contains invalid bytes. To make the .net framework behave properly the code should have been written as follows
byte[] a = Guid.NewGuid().ToByteArray();
string b = new UTF8Encoding(false, true).GetString(a); // Throws exception as expected
byte[] c = new UTF8Encoding(false, true).GetBytes(b);
Guid d = new Guid(c);
Not every sequence of bytes is a valid UTF-8 encoded string.
The GUID can contain almost any sequence of bytes. But UTF-8 as specific rules for which byte sequences are allowed if the value is >127. And a Guid will quite often not follow these rules.
Then when you encode the corrupted string back to a byte array you get a byte array longer than 16 bytes, which the constructor of Guid doesn't accept.
The documentation on UTF8Encoding.GetString states:
With error detection, an invalid sequence causes this method to throw a ArgumentException. Without error detection, invalid sequences are ignored, and no exception is thrown.
and the default constructor of UTF8Encoding has error checking disabled(don't ask me why).
This constructor creates an instance that does not provide a Unicode byte order mark and does not throw an exception when an invalid encoding is detected.
Note
For security reasons, your applications are recommended to enable error detection by using the constructor that accepts a throwOnInvalidBytes parameter and setting that parameter to true.
You might want to use Base64 encoding instead of UTF-8. That way you can map any valid byte sequence into a string and back.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With