Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using buffers to read from unknown size file

I'm trying to read blocks from a file and I have a problem.

char* inputBuffer = new char[blockSize]
while (inputFile.read(inputBuffer, blockSize)) {
    int i = inputFile.gcount();
//Do stuff
}

Suppose our block size is 1024 bytes, and the file is 24,3 KiB. After reading the 23rd block, there will be 0,3 KiB left to read. I also want to read that 0,3 KiB, in fact I use gcount() later so I can know how much of the buffer did read(...) modify (in case if it is less).
But when it accesses the 24th block, read(...) returns a value such that the program does not enter the loop, obviously because the size of the remaining unread bytes in the file is less than the buffer size. What should I do?

like image 234
erandros Avatar asked Jun 23 '11 04:06

erandros


1 Answers

I think that Konrad Rudolf who you talk about in the comment to another answer makes a good point about the problem with reading until eof. If you never reach eof because of some other error you are in an infinite loop. So take his advice, but modify it to address the problem you have identified. One way of doing it is as follows;

bool okay=true;
while ( okay ) {
    okay = inputFile.read(inputBuffer, blockSize);
    int i = inputFile.gcount();
    if( i ) {
        //Do stuff
    }
}

Edit: Since my answer has been accepted, I am editing it to be as useful as possible. It turns out my bool okay is quite unnecessary (see ferosekhanj's answer). It is better to test the value of inputFile directly, that also has the advantage that you can elegantly avoid entering the loop if the file did not open okay. So I think this is the canonical solution to this problem;

inputFile.open( "test.txt", ios::binary );
while ( inputFile ) {
    inputFile.read( inputBuffer, blockSize );
    int i = inputFile.gcount();
    if( i ) {
        //Do stuff
    }
}

Now the last time you //Do stuff, i will be less than blockSize, except in the case that the file happens to be a multiple of blockSize bytes long.

Konrad Rudolf's answer here is also good, it has the advantage that .gcount() is only called once, outside the loop, but the disadvantage that it really needs data processing to be put in a separate function, to avoid duplication.

like image 113
Bill Forster Avatar answered Oct 03 '22 01:10

Bill Forster