I am currently using Unicode in bytes and using Encoding class to get bytes and get strings.
However, I saw there is an encoder class and it seems like doing the same thing as the encoding class. Does anyone know what is the difference between them and when to use either of them.
Here are the Microsoft documentation page:
Encoder: https://msdn.microsoft.com/en-us/library/system.text.encoder(v=vs.110).aspx
Encoding: https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
There is definitely a difference. An Encoding
is an algorithm for transforming a sequence of characters into bytes and vice versa. An Encoder
is a stateful object that transforms sequences of characters into bytes. To get an Encoder
object you usually call GetEncoder
on an Encoding object. Why is it necessary to have a stateful tranformation? Imagine you are trying to efficiently encode long sequences of characters. You want to avoid creating a lot of arrays or one huge array. So you break the characters down into say reusable 1K character buffers. However this might make some illegal characters sequences, for example a utf-16 surrogate pair broken across to separate calls to GetBytes
. The Encoder
object knows how to handle this and saves the necessary state across successive calls to GetBytes
. Thus you use an Encoder
for transforming one block of text that is self-contained. I believe you can reuse an Encoder instance more transforms of multiple sections of text as long as you have called GetBytes with flush
equal to true on the last array of characters. If you just want to easily encode short strings, use the Encoding.GetBytes
methods. For the decoding operations there is a similar Decoder class that holds the decoding state.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With