Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fgetpos() behaviour depends on newline character

Consider these two files:

file1.txt (Windows newline)

abc\r\n
def\r\n

file2.txt (Unix newline)

abc\n
def\n

I've noticed that for the file2.txt, the position obtained with fgetpos is not incremented correctly. I'm working on Windows.

Let me show you an example. The following code:

#include<cstdio>

void read(FILE *file)
{
    int c = fgetc(file);
    printf("%c (%d)\n", (char)c, c);

    fpos_t pos;
    fgetpos(file, &pos); // save the position
    c = fgetc(file);
    printf("%c (%d)\n", (char)c, c);

    fsetpos(file, &pos); // restore the position - should point to previous
    c = fgetc(file);     // character, which is not the case for file2.txt
    printf("%c (%d)\n", (char)c, c);
    c = fgetc(file);
    printf("%c (%d)\n", (char)c, c);
}

int main()
{
    FILE *file = fopen("file1.txt", "r");
    printf("file1:\n");
    read(file);
    fclose(file);

    file = fopen("file2.txt", "r");
    printf("\n\nfile2:\n");
    read(file);
    fclose(file);

    return 0;
}

gives such result:

file1:
a (97)
b (98)
b (98)
c (99)


file2:
a (97)
b (98)
  (-1)
  (-1)

file1.txt works as expected, while file2.txt behaves strange. To explain what's wrong with it, I tried the following code:

void read(FILE *file)
{
    int c;
    fpos_t pos;
    while (1)
    {
        fgetpos(file, &pos);
        printf("pos: %d ", (int)pos);
        c = fgetc(file);
        if (c == EOF) break;
        printf("c: %c (%d)\n", (char)c, c);
    }
}

int main()
{
    FILE *file = fopen("file1.txt", "r");
    printf("file1:\n");
    read(file);
    fclose(file);

    file = fopen("file2.txt", "r");
    printf("\n\nfile2:\n");
    read(file);
    fclose(file);

    return 0;
}

I got this output:

file1:
pos: 0 c: a (97)
pos: 1 c: b (98)
pos: 2 c: c (99)
pos: 3 c:
 (10)
pos: 5 c: d (100)
pos: 6 c: e (101)
pos: 7 c: f (102)
pos: 8 c:
 (10)
pos: 10

file2:
pos: 0 c: a (97) // something is going wrong here...
pos: -1 c: b (98)
pos: 0 c: c (99)
pos: 1 c:
 (10)
pos: 3 c: d (100)
pos: 4 c: e (101)
pos: 5 c: f (102)
pos: 6 c:
 (10)
pos: 8

I know that fpos_t is not meant to be interpreted by coder, because it's depending on implementation. However, the above example explains the problems with fgetpos/fsetpos.

How is it possible that the newline sequence affects the internal position of the file, even before it encounters that characters?

like image 652
miloszmaki Avatar asked Mar 26 '13 23:03

miloszmaki


1 Answers

I would say the problem is probably caused by the second file confusing the implementation, since it's being opened in text mode, but it doesn't follow the requirements.

In the standard,

A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character

Your second file stream contains no valid newline characters (since it looks for \r\n to convert to the newline character internally). As a result, the implementation may not understand the line length properly, and get hopelessly confused when you try to move about in it.

Additionally,

Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment.

Bear in mind that the library will not just read each byte from the file as you call fgetc - it will read the entire file (for one so small) into the stream's buffer and operate on that.

like image 128
teppic Avatar answered Sep 23 '22 23:09

teppic