Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How EOF is defined for binary and ascii files

Tags:

c

file

windows

I'm programming C on Windows(system language is Japanese), and I have a problem about EOF of binary and ascii files.

I asked this question last week, a kind guy helped me, but I still can't really understand how the program works when reading a binary or an ascii file.

I did the following test:

Test1:

int oneChar;
iFile = fopen("myFile.tar.gz", "rb");
while ((oneChar = fgetc(iFile)) != EOF) {
        printf("%d ", oneChar);
}

Test2:

int oneChar;
iFile = fopen("myFile.tar.gz", "r");
while ((oneChar = fgetc(iFile)) != EOF) {
        printf("%d ", oneChar);
}

In the test1 case, things worked perfectly for both binary and ascii files. But in test2, program stopped reading when it encountered 0x1A in a binary file. (Does this mean that 1A == EOF?) ASCII table tells me that 1A is a control character called substitute (whatever that means...) And when I printf("%d", EOF), however, it gave me -1...

I also found this question which tells me that the OS knows exactly where a file ends, so I don't really need to find EOF in the file, because EOF is out of the range of a byte (what about 1A?)

Can someone clear things up a little for me? Thanks in advance.

like image 475
Ema Avatar asked Dec 11 '22 18:12

Ema


1 Answers

This is a Windows-specific trick for text files: SUB character, which is represented by Ctrl+Z sequence, is interpreted as EOF by fgetc. You do not have to have 1A in your text file in order to get an EOF back from fgetc, though: once you reach the actual end of file, EOF would be returned.

The standard does not define 1A as the char value to represent an EOF. The constant for EOF is of type int, with a negative value outside the range of unsigned char. In fact, the reason why fgetc returns an int, not char, is to let it return a special value for EOF.

like image 158
Sergey Kalinichenko Avatar answered Dec 27 '22 23:12

Sergey Kalinichenko