Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can sizeof(int) ever be 1 on a hosted implementation?

Tags:

c

My view is that a C implementation cannot satisfy the specification of certain stdio functions (particularly fputc/fgetc) if sizeof(int)==1, since the int needs to be able to hold any possible value of unsigned char or EOF (-1). Is this reasoning correct?

(Obviously sizeof(int) cannot be 1 if CHAR_BIT is 8, due to the minimum required range for int, so we're implicitly only talking about implementations with CHAR_BIT>=16, for instance DSPs, where typical implementations would be a freestanding implementation rather than a hosted implementation, and thus not required to provide stdio.)

Edit: After reading the answers and some links references, some thoughts on ways it might be valid for a hosted implementation to have sizeof(int)==1:

First, some citations:

7.19.7.1(2-3):

If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

7.19.8.1(2):

The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read.

Thoughts:

  • Reading back unsigned char values outside the range of int could simply have undefined implementation-defined behavior in the implementation. This is particularly unsettling, as it means that using fwrite and fread to store binary structures (which while it results in nonportable files, is supposed to be an operation you can perform portably on any single implementation) could appear to work but silently fail. essentially always results in undefined behavior. I accept that an implementation might not have a usable filesystem, but it's a lot harder to accept that an implementation could have a filesystem that automatically invokes nasal demons as soon as you try to use it, and no way to determine that it's unusable. Now that I realize the behavior is implementation-defined and not undefined, it's not quite so unsettling, and I think this might be a valid (although undesirable) implementation.

  • An implementation sizeof(int)==1 could simply define the filesystem to be empty and read-only. Then there would be no way an application could read any data written by itself, only from an input device on stdin which could be implemented so as to only give positive char values which fit in int.

Edit (again): From the C99 Rationale, 7.4:

EOF is traditionally -1, but may be any negative integer, and hence distinguishable from any valid character code.

This seems to indicate that sizeof(int) may not be 1, or at least that such was the intention of the committee.

like image 302
R.. GitHub STOP HELPING ICE Avatar asked Oct 05 '10 04:10

R.. GitHub STOP HELPING ICE


4 Answers

It is possible for an implementation to meet the interface requirements for fgetc and fputc even if sizeof(int) == 1.

The interface for fgetc says that it returns the character read as an unsigned char converted to an int. Nowhere does it say that this value cannot be EOF even though the expectation is clearly that valid reads "usually" return positive values. Of course, fgetc returns EOF on a read failure or end of stream but in these cases the file's error indicator or end-of-file indicator (respectively) is also set.

Similarly, nowhere does it say that you can't pass EOF to fputc so long as that happens to coincide with the value of an unsigned char converted to an int.

Obviously the programmer has to be very careful on such platforms. This is might not do a full copy:

void Copy(FILE *out, FILE *in)
{
    int c;
    while((c = fgetc(in)) != EOF)
        fputc(c, out);
}

Instead, you would have to do something like (not tested!):

void Copy(FILE *out, FILE *in)
{
    int c;
    while((c = fgetc(in)) != EOF || (!feof(in) && !ferror(in)))
        fputc(c, out);
}

Of course, platforms where you will have real problems are those where sizeof(int) == 1 and the conversion from unsigned char to int is not an injection. I believe that this would necessarily the case on platforms using sign and magnitude or ones complement for representation of signed integers.

like image 84
CB Bailey Avatar answered Nov 02 '22 16:11

CB Bailey


I remember this exact same question on comp.lang.c some 10 or 15 years ago. Searching for it, I've found a more current discussion here:

http://groups.google.de/group/comp.lang.c/browse_thread/thread/9047fe9cc86e1c6a/cb362cbc90e017ac

I think there are two resulting facts:

(a) There can be implementations where strict conformance is not possible. E.g. sizeof(int)==1 with one-complement's or sign-magnitude negative values or padding bits in the int type, i.e. not all unsigned char values can be converted to a valid int value.

(b) The typical idiom ((c=fgetc(in))!=EOF) is not portable (except for CHAR_BIT==8), as EOF is not required to be a separate value.

like image 28
Secure Avatar answered Nov 02 '22 18:11

Secure


I don't believe the C standard directly requires that EOF be distinct from any value that could be read from a stream. At the same time, it does seem to take for granted that it will be. Some parts of the standard have conflicting requirements that I doubt can be met if EOF is a value that could be read from a stream.

For example, consider ungetc. On one hand, the specification says (§7.19.7.11):

The ungetc function pushes the character specified by c (converted to an unsigned char) back onto the input stream pointed to by stream. Pushed-back characters will be returned by subsequent reads on that stream in the reverse order of their pushing. [ ... ] One character of pushback is guaranteed.

On the other hand, it also says:

If the value of c equals that of the macro EOF, the operation fails and the input stream is unchanged.

So, if EOF is a value that could be read from the stream, and (for example) we do read from the stream, and immediately use ungetc to put EOF back into the stream, we get a conundrum: the call is "guaranteed" to succeed, but also explicitly required to fail.

Unless somebody can see a way to reconcile these requirements, I'm left with considerable doubt as to whether such an implementation can conform.

In case anybody cares, N1548 (the current draft of the new C standard) retains the same requirements.

like image 5
Jerry Coffin Avatar answered Nov 02 '22 16:11

Jerry Coffin


Would it not be sufficient if a nominal char which shared a bit pattern with EOF was defined as non-sensical? If, for instance, CHAR_BIT was 16 but all the allowed values occupied only the 15 least significant bits (assume a 2s-complement of sign-magnitude int representation). Or must everything representable in a char have meaning as such? I confess I don't know.

Sure, that would be a weird beast, but we're letting our imaginations go here, right?

R.. has convinced me that this won't hold together. Because a hosted implementation must implement stdio.h and if fwrite is to be able to stick integers on the disk, then fgetc could return any bit pattern that would fit in a char, and that must not interfere with returning EOF. QED.

like image 3
dmckee --- ex-moderator kitten Avatar answered Nov 02 '22 16:11

dmckee --- ex-moderator kitten