that a char is represented by a byte, that a byte can always be counted upon to have 8 bits, that sizeof (char) is always 1 , and that the maximum theoretical amount of memory I can allocate (counted in char s) is the number of bytes of RAM (+ swap space).
Eight bits are called a byte. One byte character sets can contain 256 characters.
"char" has always been a misspelling of "byte" in C. sizeof(char) has to be 1, but char doesn't have to be 1 byte in size. It's more correct to say that sizeof(foo) returns a result relative to sizeof(char).
The ISO C Standard requires CHAR_BIT to be at least 8.
char
is also 16 bit on the Texas Instruments C54x DSPs, which turned up for example in OMAP2. There are other DSPs out there with 16 and 32 bit char
. I think I even heard about a 24-bit DSP, but I can't remember what, so maybe I imagined it.
Another consideration is that POSIX mandates CHAR_BIT == 8
. So if you're using POSIX you can assume it. If someone later needs to port your code to a near-implementation of POSIX, that just so happens to have the functions you use but a different size char
, that's their bad luck.
In general, though, I think it's almost always easier to work around the issue than to think about it. Just type CHAR_BIT
. If you want an exact 8 bit type, use int8_t
. Your code will noisily fail to compile on implementations which don't provide one, instead of silently using a size you didn't expect. At the very least, if I hit a case where I had a good reason to assume it, then I'd assert it.
When writing code, and thinking about cross-platform support (e.g. for general-use libraries), what sort of consideration is it worth giving to platforms with non-8-bit char?
It's not so much that it's "worth giving consideration" to something as it is playing by the rules. In C++, for example, the standard says all bytes will have "at least" 8 bits. If your code assumes that bytes have exactly 8 bits, you're violating the standard.
This may seem silly now -- "of course all bytes have 8 bits!", I hear you saying. But lots of very smart people have relied on assumptions that were not guarantees, and then everything broke. History is replete with such examples.
For instance, most early-90s developers assumed that a particular no-op CPU timing delay taking a fixed number of cycles would take a fixed amount of clock time, because most consumer CPUs were roughly equivalent in power. Unfortunately, computers got faster very quickly. This spawned the rise of boxes with "Turbo" buttons -- whose purpose, ironically, was to slow the computer down so that games using the time-delay technique could be played at a reasonable speed.
One commenter asked where in the standard it says that char must have at least 8 bits. It's in section 5.2.4.2.1. This section defines CHAR_BIT
, the number of bits in the smallest addressable entity, and has a default value of 8. It also says:
Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
So any number equal to 8 or higher is suitable for substitution by an implementation into CHAR_BIT
.
Machines with 36-bit architectures have 9-bit bytes. According to Wikipedia, machines with 36-bit architectures include:
A few of which I'm aware:
There is no such thing as a completely portable code. :-)
Yes, there may be various byte/char sizes. Yes, there may be C/C++ implementations for platforms with highly unusual values of CHAR_BIT
and UCHAR_MAX
. Yes, sometimes it is possible to write code that does not depend on char size.
However, almost any real code is not standalone. E.g. you may be writing a code that sends binary messages to network (protocol is not important). You may define structures that contain necessary fields. Than you have to serialize it. Just binary copying a structure into an output buffer is not portable: generally you don't know neither the byte order for the platform, nor structure members alignment, so the structure just holds the data, but not describes the way the data should be serialized.
Ok. You may perform byte order transformations and move the structure members (e.g. uint32_t
or similar) using memcpy
into the buffer. Why memcpy
? Because there is a lot of platforms where it is not possible to write 32-bit (16-bit, 64-bit -- no difference) when the target address is not aligned properly.
So, you have already done a lot to achieve portability.
And now the final question. We have a buffer. The data from it is sent to TCP/IP network. Such network assumes 8-bit bytes. The question is: of what type the buffer should be? If your chars are 9-bit? If they are 16-bit? 24? Maybe each char corresponds to one 8-bit byte sent to network, and only 8 bits are used? Or maybe multiple network bytes are packed into 24/16/9-bit chars? That's a question, and it is hard to believe there is a single answer that fits all cases. A lot of things depend on socket implementation for the target platform.
So, what I am speaking about. Usually code may be relatively easily made portable to certain extent. It's very important to do so if you expect using the code on different platforms. However, improving portability beyond that measure is a thing that requires a lot of effort and often gives little, as the real code almost always depends on other code (socket implementation in the example above). I am sure that for about 90% of code ability to work on platforms with bytes other than 8-bit is almost useless, for it uses environment that is bound to 8-bit. Just check the byte size and perform compilation time assertion. You almost surely will have to rewrite a lot for a highly unusual platform.
But if your code is highly "standalone" -- why not? You may write it in a way that allows different byte sizes.
It appears that you can still buy an IM6100 (i.e. a PDP-8 on a chip) out of a warehouse. That's a 12-bit architecture.
Many DSP chips have 16- or 32-bit char
. TI routinely makes such chips for example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With