Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using seekg() in text mode

Tags:

c++

While trying to read in a simple ANSI-encoded text file in text mode (Windows), I came across some strange behaviour with seekg() and tellg(); Any time I tried to use tellg(), saved its value (as pos_type), and then seek to it later, I would always wind up further ahead in the stream than where I left off.

Eventually I did a sanity check; even if I just do this...

int main()
{
   std::ifstream dataFile("myfile.txt",
         std::ifstream::in);
   if (dataFile.is_open() && !dataFile.fail())
   {
      while (dataFile.good())
      {
         std::string line;
         dataFile.seekg(dataFile.tellg());
         std::getline(dataFile, line);
      }
   }
}

...then eventually, further into the file, lines are half cut-off. Why exactly is this happening?

like image 887
Michael Avatar asked Nov 21 '14 06:11

Michael


People also ask

What is the use of seekg () function?

seekg() is a function in the iostream library that allows you to seek an arbitrary position in a file. It is included in the <fstream> header file and is defined for istream class. It is used in file handling to sets the position of the next character to be extracted from the input stream from a given file.

What are the parameters passed to seekg () function?

It is an object of type ios_base::seekdir that can take any of the following constant values: ios_base::beg (offset from the beginning of the stream's buffer). ios_base::cur (offset from the current position in the stream's buffer). ios_base::end (offset from the end of the stream's buffer).

How do I move to the beginning of a file in C++?

The rewind() function in C++ sets the file position indicator to the beginning of the given file stream.

What is seekg and Tellg in C++?

seekg() is used to move the get pointer to a desired location with respect to a reference point. tellg() is used to know where the get pointer is in a file.


1 Answers

This issue is caused by libstdc++ using the difference between the current remaining buffer with lseek64 to determine the current offset.

The buffer is set using the return value of read, which for a text mode file on windows returns the number of bytes that have been put into the buffer after endline conversion (i.e. the 2 byte \r\n endline is converted to \n, windows also seems to append a spurious newline to the end of the file).

lseek64 however (which with mingw results in a call to _lseeki64) returns the current absolute file position, and once the two values are subtracted you end up with an offset that is off by 1 for each remaining newline in the text file (+1 for the extra newline).

The following code should display the issue, you can even use a file with a single character and no newlines due to the extra newline inserted by windows.

#include <iostream>
#include <fstream>

int main()
{
  std::ifstream f("myfile.txt");

  for (char c; f.get(c);)
    std::cout << f.tellg() << ' ';
}

For a file with a single a character I get the following output

2 3

Clearly off by 1 for the first call to tellg. After the second call the file position is correct as the end has been reached after taking the extra newline into account.

Aside from opening the file in binary mode, you can circumvent the issue by disabling buffering

#include <iostream>
#include <fstream>

int main()
{
  std::ifstream f;
  f.rdbuf()->pubsetbuf(nullptr, 0);
  f.open("myfile.txt");

  for (char c; f.get(c);)
    std::cout << f.tellg() << ' ';
}

but this is far from ideal.

Hopefully mingw / mingw-w64 or gcc can fix this, but first we'll need to determine who would be responsible for fixing it. I suppose the base issue is with MSs implementation of lseek which should return appropriate values according to how the file has been opened.

like image 199
user657267 Avatar answered Oct 12 '22 18:10

user657267