I have a webservice that returns the config file to a low level hardware device. The manufacturer of this device tells me he only supports single byte charactersets for this config file.
On this wiki page I found out that the following should be single byte character sets:
But when I call Encoding.GetMaxByteCount(1) on these character sets it always returns 2.
I also tried various other encodings (for instance IBM437), but GetMaxByteCount also returns 2 for other character sets.
The method Endoding.IsSingleByte seems unreliable according to this
You should be careful in what your application does with the value for IsSingleByte. An assumption of how an Encoding will proceed may still be wrong. For example, Windows-1252 has a value of true for Encoding.IsSingleByte, but Encoding.GetMaxByteCount(1) returns 2. This is because the method considers potential leftover surrogates from a previous decoder operation.
Also the method Encoding.GetMaxByteCount has some of the same issues according to this
Note that GetMaxByteCount considers potential leftover surrogates from a previous decoder operation. Because of the decoder, passing a value of 1 to the method retrieves 2 for a single-byte encoding, such as ASCII. Your application should use the IsSingleByte property if this information is necessary.
Because of this I am not sure anymore on what to use.
Further reading.
Single-byte characters are represented as a series of lowercase letters. The format for representing one single-byte character abstractly is a . Here a stands for any single-byte character, not for the letter "a" itself. The letter "s" does not show in examples that represent strings of single-byte characters.
One byte character sets can contain 256 characters. The current standard, though, is Unicode which uses two bytes to represent all characters in all writing systems in the world in a single set.
A single-byte character set (SBCS) is a mapping of 256 individual characters to their identifying code values, implemented as a code page. An SBCS can correspond either to a Windows code page or an OEM code page. An SBCS code page can also include a non-native code page, for example, an EBCDIC code page.
Provide right-click menu to manually set character encoding for web pages. Right-click at somewhere on web page to manually set character encoding. The selected character set will automatically apply to all pages on the same site. Select "Use page default" to cancel it.
Basically, GetMaxByteCount
considers an edge-case that you will probably never need in regular code, specifically what it says about the decoder and surrogates. The point here is that some code-points are encoded as surrogate pairs, which in unfortunate cases can mean that it straddles two calls to GetBytes()
/ GetChars
(on the encoder/decoder). As a consequence, the implementation may theoretically have a single byte/character still buffered and waiting to be processed, therefore GetMaxByteCount
needs to warn about this.
However! All of this only makes sense if you are using the encoder/decoder directly. If you are using operations on the Encoding
, such as Encoding.GetBytes
, then all of this is abstracted away from you and you will never need to know. In which case, just use IsSingleByte
and you'll be fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With