Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why did POSIX mandate CHAR_BIT==8?

Tags:

c

posix

There's a note in the POSIX rationale that mandating CHAR_BIT be 8 was a concession made that was necessary to maintain alignment with C99 without throwing out sockets/networking, but I've never seen the explanation of what exactly the conflict was. Does anyone have anecdotes or citations for why it was deemed necessary?

Edit: I've gotten a lot of speculative answers regarding why it's desirable for CHAR_BIT to be 8, and I agree, but what I'm really looking for is what the technical conflict between C99 and the networking stuff in POSIX is. My best guess is that it has something to do with C99 requiring uint*_t to be exact-sized types (no padding) whereas the inttypes.h previously in POSIX made no such requirement.

like image 311
R.. GitHub STOP HELPING ICE Avatar asked Jul 08 '11 23:07

R.. GitHub STOP HELPING ICE


2 Answers

Because the vast majority of standards (related to communication) out of ANSI and ISO talk in terms of octets (8-bit values). There is none of that wishy-washy variable-sized character nonsense :-)

And, since a rather large quantity of C code used char or unsigned char for storing and/or manipulating these values, and assumed they were 8 bits wide, the fact that ISO allowed a variable size would cause problems for that code.

Remember one of the over-riding goals of ISO C - existing code is important, existing implementations are not. This is one reason why limits.h exists in the first place rather than just assuming specific values, because there was code around that assumed otherwise.

POSIX also followed that same guideline. By mandating a byte size of 8 bits, they prevented the breakage of a huge amount of code already in the real world.

like image 142
paxdiablo Avatar answered Oct 11 '22 12:10

paxdiablo


Because char is the smallest addressable unit in C, if you made char larger than 8 bits, it would be difficult or impossible to write a sockets implementation, as you said. Networks all run on CHAR_BIT == 8 machines. So, if you were to send a message from a machine where CHAR_BIT == 9 to a machine where CHAR_BIT == 8, what is the sockets library to do with the extra bit? There's no reasonable answer to that question. If you truncate the bit, then it becomes hard to specify even something as simple as a buffer to the client of the sockets code -- "It's a char array but you can only use the first 8 bits" would be unreasonable on such a system. Moreover, going from 8 bit systems to 9 bit would be the same problem -- what's the sockets system to do with that extra bit? If it sets that bit to zero, imagine what happens to someone who puts an int on the wire. You'd have to do all kinds of nasty bitmasking on the 9 bit machine to make it work correctly.

Finally, since 99.9% of machines use 8 bit characters, it's not all that great a limitation. Most machines that use CHAR_BIT != 8 don't have virtual memory either, which would exclude them from POSIX compatibility anyway.

When you're running on a single machine (as standard C assumes), you can do things like be CHAR_BIT agnostic, because both sides of what might be reading or writing data agree on what's going on. When you introduce something like sockets, where more than one machine is involved, they MUST agree on things like character size and endianness. (Endinanness is pretty much just standardized to Big Endian on the wire, though, as many more architectures differ on endianness than they do on byte size)

like image 45
Billy ONeal Avatar answered Oct 11 '22 12:10

Billy ONeal