Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What really is EOF for binary files? Condition? Character?

Tags:

I have managed this far with the knowledge that EOF is a special character inserted automatically at the end of a text file to indicate its end. But I now feel the need for some more clarification on this. I checked on Google and the Wikipedia page for EOF but they couldn't answer the following, and there are no exact Stack Overflow links for this either. So please help me on this:

  • My book says that binary mode files keep track of the end of file from the number of characters present in the directory entry of the file. (In contrast to text files which have a special EOF character to mark the end). So what is the story of EOF in context of binary files? I am confused because in the following program I successfully use !=EOF comparison while reading from an .exe file in binary mode:

     #include<stdio.h>  #include<stdlib.h>   int main()  {    int ch;      FILE *fp1,*fp2;    fp1=fopen("source.exe","rb");   fp2=fopen("dest.exe","wb");    if(fp1==NULL||fp2==NULL)   {   printf("Error opening files");   exit(-1);   }    while((ch=getc(fp1))!=EOF)   putc(ch,fp2);    fclose(fp1);   fclose(fp2);    } 
  • Is EOF a special "character" at all? Or is it a condition as Wikipedia says, a condition where the computer knows when to return a particular value like -1 (EOF on my computer)? Example of such "condition" being when a character-reading function finishes reading all characters present, or when character/string I/O functions encounter an error in reading/writing?

    Interestingly, the Stack Overflow tag for EOF blended both those definitions of the EOF. The tag for EOF said "In programming realm, EOF is a sequence of byte (or a chacracter) which indicates that there are no more contents after this.", while it also said in the "about" section that "End of file (commonly abbreviated EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream."

But I have a strong feeling EOF won't be a character as every other function seems to be returning it when it encounters an error during I/O.

It will be really nice of you if you can clear the matter for me.

like image 524
Thokchom Avatar asked May 21 '13 19:05

Thokchom


People also ask

What is the character for EOF?

The ascii value for EOF (CTRL-D) is 0x05 as shown in this ascii table . Typically a text file will have text and a bunch of whitespaces (e.g., blanks, tabs, spaces, newline characters) and terminate with an EOF.

Does binary file have end of file character?

In binary mode, both characters can be read by your program. UNIX systems only use one character, the newline, to indicate line endings. In DOS and Windows, the end of file character is 26.

Does a file always have an EOF?

EOF is not stored in the file. EOF (usually defined as -1) is returned by OS when there's no more data to read or an input error occurred. So once you reach the end of file, you must hit EOF.

What is the data type of EOF?

EOF which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream;' and 'the fgetc function obtains [the next] character as an unsigned char converted to an int '.


2 Answers

The various EOF indicators that C provides to you do not necessarily have anything to do with how the file system marks the end of a file.

Most modern file systems know the length of a file because they record it somewhere, separately from the contents of the file. The routines that read the file keep track of where you are reading and they stop when you reach the end. The C library routines generate an EOF value to return to you; they are not returning a value that is actually in the file.

Note that the EOF returned by C library routines is not actually a character. The C library routines generally return an int, and that int is either a character value or an EOF. E.g., in one implementation, the characters might have values from 0 to 255, and EOF might have the value −1. When the library routine encountered the end of the file, it did not actually see a −1 character, because there is no such character. Instead, it was told by the underlying system routine that the end of file had been reached, and it responded by returning −1 to you.

Old and crude file systems might have a value in the file that marks the end of file. For various reasons, this is usually undesirable. In its simplest implementation, it makes it impossible to store arbitrary data in the file, because you cannot store the end-of-file marker as data. One could, however, have an implementation in which the raw data in the file contains something that indicates the end of file, but data is transformed when reading or writing so that arbitrary data can be stored. (E.g., by “quoting” the end-of-file marker.)

In certain cases, things like end-of-file markers also appear in streams. This is common when reading from the terminal (or a pseudo-terminal or terminal-like device). On Windows, pressing control-Z is an indication that the user is done entering input, and it is treated similarly to reach an end-of-file. This does not mean that control-Z is an EOF. The software reading from the terminal sees control-Z, treats it as end-of-file, and returns end-of-file indications, which are likely different from control-Z. On Unix, control-D is commonly a similar sentinel marking the end of input.

like image 187
Eric Postpischil Avatar answered Oct 06 '22 01:10

Eric Postpischil


This should clear it up nicely for you.

Basically, EOF is just a macro with a pre-defined value representing the error code from I/O functions indicating that there is no more data to be read.

like image 29
Christopher Neylan Avatar answered Oct 05 '22 23:10

Christopher Neylan