Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to write with a single byte character encoding?

I have a webservice that returns the config file to a low level hardware device. The manufacturer of this device tells me he only supports single byte charactersets for this config file.

On this wiki page I found out that the following should be single byte character sets:

  • ISO 8859
  • ISO/IEC 646 (I could not find this one here)
  • various Microsoft/IBM code pages

But when I call Encoding.GetMaxByteCount(1) on these character sets it always returns 2.

I also tried various other encodings (for instance IBM437), but GetMaxByteCount also returns 2 for other character sets.

The method Endoding.IsSingleByte seems unreliable according to this

You should be careful in what your application does with the value for IsSingleByte. An assumption of how an Encoding will proceed may still be wrong. For example, Windows-1252 has a value of true for Encoding.IsSingleByte, but Encoding.GetMaxByteCount(1) returns 2. This is because the method considers potential leftover surrogates from a previous decoder operation.

Also the method Encoding.GetMaxByteCount has some of the same issues according to this

Note that GetMaxByteCount considers potential leftover surrogates from a previous decoder operation. Because of the decoder, passing a value of 1 to the method retrieves 2 for a single-byte encoding, such as ASCII. Your application should use the IsSingleByte property if this information is necessary.

Because of this I am not sure anymore on what to use.

Further reading.

like image 723
Sjors Miltenburg Avatar asked Sep 21 '12 07:09

Sjors Miltenburg


People also ask

How do you write a single byte character?

Single-byte characters are represented as a series of lowercase letters. The format for representing one single-byte character abstractly is a . Here a stands for any single-byte character, not for the letter "a" itself. The letter "s" does not show in examples that represent strings of single-byte characters.

Can characters be stored in a single byte?

One byte character sets can contain 256 characters. The current standard, though, is Unicode which uses two bytes to represent all characters in all writing systems in the world in a single set.

What is single byte character data?

A single-byte character set (SBCS) is a mapping of 256 individual characters to their identifying code values, implemented as a code page. An SBCS can correspond either to a Windows code page or an OEM code page. An SBCS code page can also include a non-native code page, for example, an EBCDIC code page.

How do you set character encoding?

Provide right-click menu to manually set character encoding for web pages. Right-click at somewhere on web page to manually set character encoding. The selected character set will automatically apply to all pages on the same site. Select "Use page default" to cancel it.


1 Answers

Basically, GetMaxByteCount considers an edge-case that you will probably never need in regular code, specifically what it says about the decoder and surrogates. The point here is that some code-points are encoded as surrogate pairs, which in unfortunate cases can mean that it straddles two calls to GetBytes() / GetChars (on the encoder/decoder). As a consequence, the implementation may theoretically have a single byte/character still buffered and waiting to be processed, therefore GetMaxByteCount needs to warn about this.

However! All of this only makes sense if you are using the encoder/decoder directly. If you are using operations on the Encoding, such as Encoding.GetBytes, then all of this is abstracted away from you and you will never need to know. In which case, just use IsSingleByte and you'll be fine.

like image 153
Marc Gravell Avatar answered Sep 19 '22 17:09

Marc Gravell