Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EOF reached before end of file

Tags:

c

linux

scanf

I'm making a program for school where I have a multiprocess program where each process reads a portion of a file and they work together to count the number of words in the file. I'm having an issue where if there are more than 2 processes, then all of the processes read EOF from the file before they've read their portion of the file. Here's the relevant code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

int main(int argc, char *argv[]) {

    FILE *input_textfile = NULL;
    char input_word[1024];
    int num_processes = 0;
    int proc_num = 0; //The index of this process (used after forking)
    long file_size = -1;

    input_textfile = fopen(argv[1], "r");
    num_processes = atoi(argv[2]);

    //...Normally error checking would go here

    if (num_processes > 1) {

        //...create space for pipes

        for (proc_num = 0; proc_num < num_processes - 1; proc_num++) {

            //...create pipes

            pid_t proc = fork();

            if (proc == -1) {
                fprintf(stderr,"Could not fork process index %d", proc_num);
                perror("");
                return 1;
            } else if (proc == 0) {
                break;
            }

            //...link up the pipes
        }
    }

    //This code taken from http://stackoverflow.com/questions/238603/how-can-i-get-a-files-size-in-c
    //Interestingly, it also fixes a bug we had where the child would start reading at an unpredictable place
    //No idea why, but apparently the offset wasn't guarenteed to start at 0 for some reason
    fseek(input_textfile, 0L, SEEK_END);
    file_size = ftell(input_textfile);
    fseek(input_textfile, proc_num * (1.0 * file_size / num_processes), 0);

    //read all words from the file and add them to the linked list
    if (file_size != 0) {

        //Explaination of this mess of a while loop:
        //  if we're a child process (proc_num < num_processes - 1), then loop until we make it to where the next
        //  process would start (the ftell part)
        //  if we're the parent (proc_num == num_processes - 1), loop until we reach the end of the file
        while ((proc_num < num_processes - 1 && ftell(input_textfile) < (proc_num + 1) * (1.0 * file_size / num_processes))
                || (proc_num == num_processes - 1 && ftell(input_textfile) < file_size)){
            int res = fscanf(input_textfile, "%s", input_word);

            if (res == 1) {
                //count the word
            } else if (res == EOF && errno != 0) {
                perror("Error reading file: ");
                exit(1);
            } else if (res == EOF && ftell(input_textfile) < file_size) {
                printf("Process %d found unexpected EOF at %ld.\n", proc_num, ftell(input_textfile));
                exit(1);
            } else if (res == EOF && feof(input_textfile)){
                continue;
            } else {
                printf("Scanf returned unexpected value: %d\n", res);
                exit(1);
            }
        }
    }

    //don't get here anyway, so no point in closing files and whatnot

    return 0;
}

Output when running the file with 3 processes:

All files opened successfully
Process 2 found unexpected EOF at 1323008.
Process 1 found unexpected EOF at 823849.
Process 0 found unexpected EOF at 331776.

The test file that causes the error: https://dl.dropboxusercontent.com/u/16835571/test34.txt

Compile with:

gcc main.c -o wordc-mp

and run as:

wordc-mp test34.txt 3

It's worth noting that only that particular file gives me issues, but the offsets of the error keep changing so it's not the contents of the file.

like image 864
Evan Allan Avatar asked Oct 19 '22 13:10

Evan Allan


1 Answers

You have created your file descriptor before forking. A child process inherits the file descriptor which point to the same file description of the parent, and thus, advancing with one of the children make the cursor advance for all the children.

From "man fork", you can have the confirmation :

  • The child process is created with a single thread—the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use of pthread_atfork(3) may be helpful for dealing with problems that this can cause.

  • The child inherits copies of the parent's set of open file descrip‐ tors. Each file descriptor in the child refers to the same open file description (see open(2)) as the corresponding file descriptor in the parent. This means that the two descriptors share open file status flags, current file offset, and signal-driven I/O attributes (see the description of F_SETOWN and F_SETSIG in fcntl(2)).

like image 103
Jonathan Schoreels Avatar answered Oct 21 '22 08:10

Jonathan Schoreels