C - Why did ANSI only specify six characters for the minimum number of significant characters in an external identifier?

Tags:

I have a question regarding section 5.2.4.1 Translation Limits in the first American National Standard for Programming languages - C, also known as ANSI/ISO 9899-1990, ISO/IEC 9899.1990 (E), C89, etc. Simply put, the first ANSI C standard.

What does the standard say that is so strange?

It infamously states that a conforming C compiler is only required to handle, and I quote:

5.2.4.1 Translation Limits

6 significant initial characters in an external identifier

Now, it is painfully obvious that this is unreasonably short, especially considering that C does not have anything similar to a name space. It is especially important to allow for descriptive names when dealing with external identifiers, seeing how they will "pollute" everything you link.

Even the standard library mandates functions with a longer name, longjmp, tmpfile, strncat. The latter, strncat, showing that they had to work a bit to invent library names where the initial six characters were unique, instead of the arguably more logical strcatn which would have collided with strcat.

Why is it still a problem to me?

I enjoy oldish computers. I'm trying to write programs that will compile and work well on platforms pre-C99, which sometimes does not exist on my beloved targets. Perhaps I also enjoy trying to really follow the standard. I have learned a lot about C99 and C11 by just digging through the older standards, trying to trace reasons for certain limitations and implementation issues.

So, even though I know of no compiler or linker actually enforcing or imposing this limitation, it still nags me that I can not claim to have written strictly conforming code if I also want to use legible and non-colliding external identifiers.

Why would they impose such a thing?

They began work on the standardization some time during the early eighties, and finalized it in 1988 or 1989. Even in the seventies and sixties, it would not have been any problem whatsoever to handle longer identifiers.

Considering that any compiler wanting to conform to the new standard must be modified - if only to update the documentation - I don't see how it would be unreasonable for ANSI to set down the foot and say something similar to "It is 1989 already. You must handle 31 significant initial characters". It would not have been a problem for any platform, even ancient ones.

Backwards compatibility?

From what I've read when searching for this, the problem might come from FORTRAN. In an answer to the question What's the exact role of "significant characters" in C (variables)?, Jonathan Leffler writes:

Part of the trouble may have been Fortran; it only required support for 6 character monocase names, so linkers on systems where Fortran was widely used did not need to support longer names.

To me, this seems like the most reasonable answer to the direct question Why?. But considering that this restriction bugs me every time I want to write a program that could theoretically be built on old systems, I would like to know some more details.

Questions

After having searched a bit about the FORTRAN track, I've only came up with theories and hand-waving. Which popular platforms did actually impose a limit of only 6 characters? Is there a linker which was extra popular, that forced the standards committee to budge?
I'm not old enough to have been interested in these kind of details when they were discussed. Has this limit and its rationale been publicly discussed and defended? Was there a public outcry, or just silently ignored? Pitchforks outside the ANSI headquarters?

Ultimately, the answers to these questions will make it easier for me to decide how bad I should sleep at night for giving reasonable names to my functions.

927

asked Jun 26 '16 05:06

pipe

1 Answers

30 years ago - I was there - the vast majority of the world's code was written in Cobol, Fortran and PL/1 and the vast majority of that ran on IBM 370-series mainframe computers, or compatibles. Most of the C code in the world ran on DEC's PDP-11 and VAX mini-computers. Unix and C were born on the PDP and DEC hardware was their stronghold.

This was the world from which the ANSI C committee came and in which they considered the practicalities of linking code written in C with the languages that really mattered, on the systems that really mattered.

Fortran compilers were Fortran 77 compilers and restricted identifiers to 6 characters. PL/1 compilers, back then, restricted external identifiers to 7 characters. The S/370 system linker truncated symbols to 8 characters. Not at all co-incidentally, the PDP-11 assembly language required symbols to be unique within the first 6 characters.

There weren't any pitchforks on the lawn of the ANSI C committee when it stipulated 6 initial significant characters for external identifiers. That meant a conforming compiler could be implemented on IBM mainframes; and it need not be one to which the PDP-11 assembler would be inadequate and need not be able to emit code that couldn't even be linked with Fortan 77. It was a wholly unsensational choice. The ANSI C committee could no more have "put its foot down" for changing the IBM mainframe linker than it could have laid down the law about Soviet missile design.

It is 1989 already. You must handle 31 significant initial characters". It would not have been a problem for any platform, even ancient ones.

You're wrong about that. Run Moore's Law backwards mentally for 30 years and try to image how puny computers were while that Committee was at work. A mainframe computer that supported hundreds of users as well as running all the data-processing systems of a large corporation typically did it with less than the processing power, the memory and storage resources I've got in my old Google Nexus tablet today.

An IBM 3380E hard disc unit, 1985, had a capacity of 5.0GB and cost around $120K; $270K in today's money. It had a transfer rate of 24Mbps, about 2% of what my laptop's HD delivers. With parameters like that, every byte that the system had to store, read or write, every disc rotation, every clock cycle, weighed on the bottom line. And this had always been the case, only more so. A miser-like economy of storage, at byte granularity, was ingrained in programming practice and those short public symbol names was just one ingrained expression of it.

The problem was not, of course, that the puny, fabulously expensive mainframes and minis that dominated the culture and the counsels of the 1980s could not have supported languages, compilers, linkers and programming practices in which this miserly economy of storage (and everything else) was tossed away. Of course they could, if everybody had one, like a laptop or a mobile phone. What they couldn't do, without it, was support the huge multi-user workloads that they were bought to run. The software needed to be excruciatingly lean to do so much with so little.

answered Jan 04 '23 09:01

Mike Kinghan

Related questions
                            
                                How to compile a C project with more than one main function?
                            
                                Force static linking of library linked to Xcode target?
                            
                                Random unresolved external symbols that shouldn't be there
                            
                                Constants in Objective-C and "duplicate symbol" linker error
                            
                                /usr/bin/ld: cannot find -lcurl
                            
                                How to tell mex to link with the libstdc++.so.6 in /usr/lib instead of the one in the MATLAB directory?
                            
                                Apple Mach-O Linker (id) Error - undefined symbols for architecture i386
                            
                                Undefined reference - despite lib being found by linker
                            
                                Error while loading shared libraries: /usr/local/lib64/libssl.so.1.1
                            
                                framework not found issue
                            
                                Bitcode errors with LinkedIn SDK
                            
                                Why do STM32 gcc linker scripts automatically discard all input sections from these standard libraries: libc.a, libm.a, libgcc.a?
                            
                                Slow compile times with react-native iOS app
                            
                                How to use static linking with OpenSSL in C/C++
                            
                                Making an iOS framework: including 3rd party libraries and code
                            
                                How does the -u option for ld work and when is it useful?
                            
                                Understanding the linkerscript for an ARM Cortex-M microcontroller
                            
                                Getting cURL to work with Visual Studios 2017
                            
                                .NET building process and linking

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With