Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strict ISO C Conformance Test

I am currently working on a C project that needs to be fairly portable among different building environments. The project targets POSIX-compliant systems on a hosted C environment.

One way to achieve a good degree of portability is to code under conformance to a chosen standard, but it is difficult to determine whether a given translation unit is strict-conformant to ISO C. For example, it might violate some translation limits, or it might be relying on an undefined behavior, without any diagnostic message from the compilation environment. I am not even sure whether it is possible to check for strict conformance of large projects.

With that in mind, is there any compiler, tool or method to test for strict ISO C conformance under a given standard (for example, C89 or C99) of a translation unit?

Any help is appreciated.

like image 894
alecov Avatar asked Aug 09 '10 20:08

alecov


2 Answers

It is not possible in general to find undefined run-time behavior. For example, consider

void foo(int *p, int *q)
{
    *p = (*q)++;
    ...

which is undefined if p == q. Whether that can happen can't be determined ahead of time without solving the halting problem.

(Edited to fix mistake caf pointed out. Thanks, caf.)

like image 103
David Thornley Avatar answered Nov 09 '22 01:11

David Thornley


Not really. The C standard doesn't set any absolute minimum limits on translation units that must be accepted. As such, a perfectly accurate checker would be trivial to write, but utterly useless in practice:

#include <stdio.h>

int main(int argc, char **argv) { 
    int i;
    for (i=1; i<argc; i++)
        fprintf(stderr, "`%s`: Translation limit (potentially) exceeded.\n", argv[i]);
    return 0;
}

Yes, this rejects everything, no matter how trivial. That is in accordance with the standard. As I said, it's utterly useless in practice. Unfortunately, you can't really do a whole lot better -- when you decide to port to a different implementation, you could run into some oddball resource limit you've never seen before, so any code you write (up to an including "hello world") could potentially exceed a resource limit despite being allowed by dozens or even hundreds of compilers on/for much smaller systems.

Edit:

Why a "hello world" program isn't strictly conforming

First, it's worth re-stating the definition of "strictly conforming": "A strictly conforming program shall use only those features of the language and library specified in this International Standard.2) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit."

There are actually a number of reasons "Hello, World" isn't strictly conforming. First, as implied above, the minimum requirements for implementation limits are completely meaningless -- although there has to be some program that meets certain limits that will be accepted, no other program has to be accepted, even if it doesn't even come close to any of those limits. Given the way the requirement is stated, it's open to question (at best) whether there is any such thing as a program that doesn't exceed any minimum implementation limit, because the standard doesn't really define any minimum implementation limits.

Second, during phase 1 of translation: "Physical source file multibyte characters are mapped, in an implementation defined manner, to the source character set ... " (§5.1.1.2/1). Since "Hello, World!" (or whatever variant you prefer) is supplied as a string literal in the source file, it can be (is) mapped in an implementation-defined manner to the source character set. An implementation is free to decide that (for an idiotic example) string literals will be ROT13 encoded, and as long as that fact is properly documented, it's perfectly legitimate.

Third, the output is normally written via stdout. stdout is a text stream. According to the standard: "Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation." (§7.19.2/2) As such, an implementation could (for example) do Huffman compression on the output (on Monday, Wednesday, or Friday).

So, we have (at least) three distinct points at which the output from a "Hello, World!" depends on implementation-defined characteristics -- any one of which would prevent it from fitting the definition of a strictly conforming program.

like image 23
Jerry Coffin Avatar answered Nov 08 '22 23:11

Jerry Coffin