There's a note in the POSIX rationale that mandating CHAR_BIT be 8 was a concession made that was necessary to maintain alignment with C99 without throwing out sockets/networking, but I've never seen the explanation of what exactly the conflict was. Does anyone have anecdotes or citations for why it was deemed necessary?
Edit: I've gotten a lot of speculative answers regarding why it's desirable for CHAR_BIT
to be 8, and I agree, but what I'm really looking for is what the technical conflict between C99 and the networking stuff in POSIX is. My best guess is that it has something to do with C99 requiring uint*_t
to be exact-sized types (no padding) whereas the inttypes.h
previously in POSIX made no such requirement.
Because the vast majority of standards (related to communication) out of ANSI and ISO talk in terms of octets (8-bit values). There is none of that wishy-washy variable-sized character nonsense :-)
And, since a rather large quantity of C code used char
or unsigned char
for storing and/or manipulating these values, and assumed they were 8 bits wide, the fact that ISO allowed a variable size would cause problems for that code.
Remember one of the over-riding goals of ISO C - existing code is important, existing implementations are not. This is one reason why limits.h
exists in the first place rather than just assuming specific values, because there was code around that assumed otherwise.
POSIX also followed that same guideline. By mandating a byte size of 8 bits, they prevented the breakage of a huge amount of code already in the real world.
Because char
is the smallest addressable unit in C, if you made char
larger than 8 bits, it would be difficult or impossible to write a sockets implementation, as you said. Networks all run on CHAR_BIT == 8
machines. So, if you were to send a message from a machine where CHAR_BIT == 9
to a machine where CHAR_BIT == 8
, what is the sockets library to do with the extra bit? There's no reasonable answer to that question. If you truncate the bit, then it becomes hard to specify even something as simple as a buffer to the client of the sockets code -- "It's a char array but you can only use the first 8 bits" would be unreasonable on such a system. Moreover, going from 8 bit systems to 9 bit would be the same problem -- what's the sockets system to do with that extra bit? If it sets that bit to zero, imagine what happens to someone who puts an int
on the wire. You'd have to do all kinds of nasty bitmasking on the 9 bit machine to make it work correctly.
Finally, since 99.9% of machines use 8 bit characters, it's not all that great a limitation. Most machines that use CHAR_BIT != 8
don't have virtual memory either, which would exclude them from POSIX compatibility anyway.
When you're running on a single machine (as standard C assumes), you can do things like be CHAR_BIT
agnostic, because both sides of what might be reading or writing data agree on what's going on. When you introduce something like sockets, where more than one machine is involved, they MUST agree on things like character size and endianness. (Endinanness is pretty much just standardized to Big Endian on the wire, though, as many more architectures differ on endianness than they do on byte size)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With