Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it undefined behavior to exceed translation limits and are there checker tools to find it?

ORIGINAL QUESTION:

I'm searching the C90 standard for things to be aware of, when writing hignly portable code, while having low trust in the good will of the compiler vendor, and assuming that my software might kill somebody sometimes, if I do things wrong. Let's say I'm a little paranoid.

At the moment I am thinking about the "Translation limits" (5.2.4.1 ANSI/ISO 9899:1990). As pointed out in the standard and in: "Does ansi C place a limit on the number of external variables in a program?", those are minimum requirements for a standard conform implementation. Now on the other hand this means, any implementation does not have to do more - and if I want to be sure that my code works for any confrom implementation, these limits represent absolut limits for me.

So far so annoying.

So the compiler vendor choose limits that equals or are above the minimum required tranlation limits.

What happens now if one exceed these implementation-defined tranlation limits of a specific implementation? In my copy of ANSI/IO 9899:1990 (C90) I haven't found anything, so I think it is Undefined Behavior "of the 3. kind" (by omission). On the other hand would this not be the first time, that I misunderstood the standard or didn't find the right passage.

So here are my questions:

  • IS exceeding the translation limits of a specific implementation undefined behavior in C90?

  • Does C90 behavior hold for the corrected versions up to C95/C96 and for the new iterations C99 & C11?

  • Have anyone seen a checker tool out there, that checks for the minimal, or (tool) user defined limits?

ASPECTS BEYOND THE ORIGINAL QUESTION:

Interesting aspects in answers and comments:

1) As Michael Burr pointed out in a direct comment to the question, according to the C-Standard (I have only checked C90 without corrigendae, and the C99 draft, Michael referenced here) a conform C implementation only needs to accept ONE program, that contains all limits at the same time, which in the strictest interpretation nullifies any minimum limit guarantees.

2) As rubenvb and Keith Thompson pointed out, implementations of some quality should provide diagnostics for the case, that their implementation defined limits are exceeded, especially if the are not conform to the minimum requirements (rubenvb linked an example for MSVC in a comment).

3) As exceeding the compiler limits might be Undefined behavior, but surely lead to some error, the values of the "variables" to which the translation limits apply for a certain piece of my code represent preconditions for reuse.

My personal strategies to deal with them

1) So for maximal paranoia, I will make a fool out of myself, and annoy the compiler vendors' support with a request to guarantee me, that the limits chosen by the implementation apply to any program. :-(

2) So I will investigate the compiler documentations and the capacity for suffering of the compiler supports for getting the confirmation, that: - that for every translation limit, if being exceeded, a diagnostic will be raised, and - because it is undefined behavior, if every instance of exceeding a translation limit will raise a diagnostic - or else another error already prevented a compilation.

3) So I will try to get hands on a tool (or develop myself if I really must), that measure those values, and provide them as precondition for code reuse for my program. As Keith Thompson pointed out in this answer some of values might need a deeper knowledge on how the Implementation is... implemented. I am not perfectly sure what can help in such cases beyond actions in 2.) yet, as far as I see, I have to test - but I only need to test if there is UB (without a diagnostic), and if this is the case, a successful test can not guarantee correctness in the general case.

ANSWERED:

Yes it is undefined behavior by obmission.

Keith Thompson has showed in his (accepted) anwser with terminology of and reference to the C standard documents, that it is undefined behavior.

A tool that checks transaction limits in the code has not (yet) discovered by the commenters. If a tool occurs to anyone that have (even partly) this functionality, please leave an answer or comment.

like image 710
Mark A. Avatar asked May 19 '14 05:05

Mark A.


1 Answers

I believe the behavior is undefined.

The standard requires a diagnostic for any translation unit that violates a constraint or syntax rule (N1570 5.1.1.3), and may not successfully translate a translation unit that contains a #error directive that survives the preprocessing phase (n1570 4, paragraph 4). (N1570 is a draft of the C11 standard, but this is the same across C90, C99, and C11, except that #error was added by C99.)

All constraints and syntax rules are specified explicitly in the standard. Exceeding an implementation-defined limit violates neither a constraint nor a syntax rule. It's sufficiently obvious, I think, that an implementation is not required to successfully process an otherwise correct program that exceeds a translation limit, but the standard says nothing about how it should respond to such a violation. Therefore, the behavior is undefined by omission.

(An implementation of decent quality would issue a diagnostic saying that a limit has been exceeded, but this is not required by the standard.)

To answer the third part of your question, no, I haven't heard of a static checker tool that checks programs for violations of the minimum translation limits. Such a tool could be quite useful, and probably wouldn't be too difficult to write once you have a C parser. For the limit on the size of an object (32767 bytes in C90, 65535 bytes in C99 and C11), it would have to know how the compiler determines object sizes; int arr[30000]; may or may not exceed 65535 bytes, depending on sizeof (int). I wouldn't be too surprised if someone has already implemented such a tool and I just haven't heard of it.

Note that most implementations do not impose the fixed limits that the standard permits; rather, any limits are imposed by the memory resources available at compile time.

The standard does present the translation limits in a rather odd way. I'm thinking in particular of the clause that says:

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:

(that's section 5.2.4.1 in C90, C99, and C11). So a perverse implementation could accept exactly one program and reject all others.

The point, I think is that specifying reasonable limits that all implementations must meet would be impractical. The standard could say that all implementations must always accept objects of at least 32767 bytes -- but what about a program that defines a million such objects? The limits interact with each other in extremely complex ways, and the nature of the interaction depends on the internal structure of each compiler. (If you think you can define the requirements for translation limits better than the C standard does so, I encourage you to try it.)

Instead, the standard states the requirements in such a way that the easiest way to implement a useful compiler that obeys the letter of the standard is to implement a useful compiler that obeys the spirit of the standard, by not imposing any unreasonable limits. A useless compiler that meets the letter of the standard is possible but irrelevant; I don't know that anybody has ever implemented such a thing, and I'm sure nobody would attempt to use it.

like image 133
Keith Thompson Avatar answered Nov 15 '22 21:11

Keith Thompson