Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading binary istream byte by byte

I was attempting to read a binary file byte by byte using an ifstream. I've used istream methods like get() before to read entire chunks of a binary file at once without a problem. But my current task lends itself to going byte by byte and relying on the buffering in the io-system to make it efficient. The problem is that I seemed to reach the end of the file several bytes sooner than I should. So I wrote the following test program:

#include <iostream>
#include <fstream>

int main() {
    typedef unsigned char uint8;
    std::ifstream source("test.dat", std::ios_base::binary);
    while (source) {
        std::ios::pos_type before = source.tellg();
        uint8 x;
        source >> x;
        std::ios::pos_type after = source.tellg();
        std::cout << before << ' ' << static_cast<int>(x) << ' '
                  << after << std::endl;
    }
    return 0;
}

This dumps the contents of test.dat, one byte per line, showing the file position before and after.

Sure enough, if my file happens to have the two-byte sequence 0x0D-0x0A (which corresponds to carriage return and line feed), those bytes are skipped.

  • I've opened the stream in binary mode. Shouldn't that prevent it from interpreting line separators?
  • Do extraction operators always use text mode?
  • What's the right way to read byte by byte from a binary istream?

MSVC++ 2008 on Windows.

like image 758
Adrian McCarthy Avatar asked Apr 01 '11 12:04

Adrian McCarthy


4 Answers

The >> extractors are for formatted input; they skip white space (by default). For single character unformatted input, you can use istream::get() (returns an int, either EOF if the read fails, or a value in the range [0,UCHAR_MAX]) or istream::get(char&) (puts the character read in the argument, returns something which converts to bool, true if the read succeeds, and false if it fails.

like image 183
James Kanze Avatar answered Nov 13 '22 06:11

James Kanze


there is a read() member function in which you can specify the number of bytes.

like image 23
stefaanv Avatar answered Nov 13 '22 04:11

stefaanv


Why are you using formatted extraction, rather than .read()?

like image 4
Lightness Races in Orbit Avatar answered Nov 13 '22 04:11

Lightness Races in Orbit


source.get()

will give you a single byte. It is unformatted input function. operator>> is formatted input function that may imply skipping whitespace characters.

like image 4
Serge Dundich Avatar answered Nov 13 '22 06:11

Serge Dundich