I have a question regarding section 5.2.4.1 Translation Limits in the first American National Standard for Programming languages - C
, also known as ANSI/ISO 9899-1990, ISO/IEC 9899.1990 (E), C89, etc. Simply put, the first ANSI C standard.
It infamously states that a conforming C compiler is only required to handle, and I quote:
5.2.4.1 Translation Limits
- 6 significant initial characters in an external identifier
Now, it is painfully obvious that this is unreasonably short, especially considering that C does not have anything similar to a name space. It is especially important to allow for descriptive names when dealing with external identifiers, seeing how they will "pollute" everything you link.
Even the standard library mandates functions with a longer name, longjmp
, tmpfile
, strncat
. The latter, strncat
, showing that they had to work a bit to invent library names where the initial six characters were unique, instead of the arguably more logical strcatn
which would have collided with strcat
.
I enjoy oldish computers. I'm trying to write programs that will compile and work well on platforms pre-C99, which sometimes does not exist on my beloved targets. Perhaps I also enjoy trying to really follow the standard. I have learned a lot about C99 and C11 by just digging through the older standards, trying to trace reasons for certain limitations and implementation issues.
So, even though I know of no compiler or linker actually enforcing or imposing this limitation, it still nags me that I can not claim to have written strictly conforming code if I also want to use legible and non-colliding external identifiers.
They began work on the standardization some time during the early eighties, and finalized it in 1988 or 1989. Even in the seventies and sixties, it would not have been any problem whatsoever to handle longer identifiers.
Considering that any compiler wanting to conform to the new standard must be modified - if only to update the documentation - I don't see how it would be unreasonable for ANSI to set down the foot and say something similar to "It is 1989 already. You must handle 31 significant initial characters". It would not have been a problem for any platform, even ancient ones.
From what I've read when searching for this, the problem might come from FORTRAN. In an answer to the question What's the exact role of "significant characters" in C (variables)?, Jonathan Leffler writes:
Part of the trouble may have been Fortran; it only required support for 6 character monocase names, so linkers on systems where Fortran was widely used did not need to support longer names.
To me, this seems like the most reasonable answer to the direct question Why?. But considering that this restriction bugs me every time I want to write a program that could theoretically be built on old systems, I would like to know some more details.
Ultimately, the answers to these questions will make it easier for me to decide how bad I should sleep at night for giving reasonable names to my functions.
Syntax. The first character of an identifier name must be a nondigit (that is, the first character must be an underscore or an uppercase or lowercase letter). ANSI allows six significant characters in an external identifier's name and 31 for names of internal (within a function) identifiers.
The minimum length of an identifier is 1 character. The maximum length of an identifier is currently 128 characters. An identifier must start with an alphanumeric or underscore character. The remainder can contain any combination of alphanumeric characters and underscore characters.
30 years ago - I was there - the vast majority of the world's code was written in Cobol, Fortran and PL/1 and the vast majority of that ran on IBM 370-series mainframe computers, or compatibles. Most of the C code in the world ran on DEC's PDP-11 and VAX mini-computers. Unix and C were born on the PDP and DEC hardware was their stronghold.
This was the world from which the ANSI C committee came and in which they considered the practicalities of linking code written in C with the languages that really mattered, on the systems that really mattered.
Fortran compilers were Fortran 77 compilers and restricted identifiers to 6 characters. PL/1 compilers, back then, restricted external identifiers to 7 characters. The S/370 system linker truncated symbols to 8 characters. Not at all co-incidentally, the PDP-11 assembly language required symbols to be unique within the first 6 characters.
There weren't any pitchforks on the lawn of the ANSI C committee when it stipulated 6 initial significant characters for external identifiers. That meant a conforming compiler could be implemented on IBM mainframes; and it need not be one to which the PDP-11 assembler would be inadequate and need not be able to emit code that couldn't even be linked with Fortan 77. It was a wholly unsensational choice. The ANSI C committee could no more have "put its foot down" for changing the IBM mainframe linker than it could have laid down the law about Soviet missile design.
It is 1989 already. You must handle 31 significant initial characters". It would not have been a problem for any platform, even ancient ones.
You're wrong about that. Run Moore's Law backwards mentally for 30 years and try to image how puny computers were while that Committee was at work. A mainframe computer that supported hundreds of users as well as running all the data-processing systems of a large corporation typically did it with less than the processing power, the memory and storage resources I've got in my old Google Nexus tablet today.
An IBM 3380E hard disc unit, 1985, had a capacity of 5.0GB and cost around $120K; $270K in today's money. It had a transfer rate of 24Mbps, about 2% of what my laptop's HD delivers. With parameters like that, every byte that the system had to store, read or write, every disc rotation, every clock cycle, weighed on the bottom line. And this had always been the case, only more so. A miser-like economy of storage, at byte granularity, was ingrained in programming practice and those short public symbol names was just one ingrained expression of it.
The problem was not, of course, that the puny, fabulously expensive mainframes and minis that dominated the culture and the counsels of the 1980s could not have supported languages, compilers, linkers and programming practices in which this miserly economy of storage (and everything else) was tossed away. Of course they could, if everybody had one, like a laptop or a mobile phone. What they couldn't do, without it, was support the huge multi-user workloads that they were bought to run. The software needed to be excruciatingly lean to do so much with so little.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With