Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading Binary File into a Structure (C++)

So I'm having a bit of an issue of not being able to properly read a binary file into my structure. The structure is this:

struct Student
{
    char name[25];
    int quiz1;
    int quiz2;
    int quiz3;
};

It is 37 bytes (25 bytes from char array, and 4 bytes per integer). My .dat file is 185 bytes. It's 5 students with 3 integer grades. So each student takes up 37 bytes (37*5=185).

It looks something like this in plain text format:

Bart Simpson          75   65   70
Ralph Wiggum          35   60   44
Lisa Simpson          100  98   91
Martin Prince         99   98   99
Milhouse Van Houten   80   87   79

I'm able to read each of the records individually by using this code:

Student stud;

fstream file;
file.open("quizzes.dat", ios::in | ios::out | ios::binary);

if (file.fail())
{
    cout << "ERROR: Cannot open the file..." << endl;
    exit(0);
}

file.read(stud.name, sizeof(stud.name));
file.read(reinterpret_cast<char *>(&stud.quiz1), sizeof(stud.quiz1));
file.read(reinterpret_cast<char *>(&stud.quiz2), sizeof(stud.quiz2));
file.read(reinterpret_cast<char *>(&stud.quiz3), sizeof(stud.quiz3));

while(!file.eof())
{
    cout << left 
         << setw(25) << stud.name
         << setw(5)  << stud.quiz1
         << setw(5)  << stud.quiz2
         << setw(5)  << stud.quiz3
         << endl;

    // Reading the next record
    file.read(stud.name, sizeof(stud.name));
    file.read(reinterpret_cast<char *>(&stud.quiz1), sizeof(stud.quiz1));
    file.read(reinterpret_cast<char *>(&stud.quiz2), sizeof(stud.quiz2));
    file.read(reinterpret_cast<char *>(&stud.quiz3), sizeof(stud.quiz3));
}

And I get a nice looking output, but I want to be able to read in one whole structure at a time, not just individual members of each structure at a time. This code is what I believe needed to accomplish the task, but... it doesn't work (I'll show output after it):

*not including the similar parts as far as opening of the file and structure declaration, etc.

file.read(reinterpret_cast<char *>(&stud), sizeof(stud));

while(!file.eof())
{
    cout << left 
         << setw(25) << stud.name
         << setw(5)  << stud.quiz1
         << setw(5)  << stud.quiz2
         << setw(5)  << stud.quiz3
         << endl;

    file.read(reinterpret_cast<char *>(&stud), sizeof(stud));
}

OUTPUT:

Bart Simpson             16640179201818317312
ph Wiggum                288358417665884161394631027
impson                   129184563217692391371917853806
ince                     175193530917020655191851872800

The only part it doesn't mess up is the first name, after that it's down the hill.. I've tried everything and I've no idea what is wrong. I've even searched through the books I have and I couldn't find anything. Things in there look like what I have and they work, but for some odd reason mine doesn't. I did the file.get(ch) (ch being a char) at byte 25 and it returned K, which is ASCII for 75.. which is the 1st test score, so, everything's where it should be. It's just not reading in my structures properly.

Any help would be greatly appreciated, I'm just stuck with this one.

EDIT: After receiving such a large amount of unexpected and awesome input from you guys, I've decided to take your advice and stick with reading in one member at a time. I made things cleaner and smaller by using functions. Thank you once again for providing such quick and enlightening input. It's much appreciated.

IF you're interested in a workaround that's not recommended by most, scroll towards the bottom, to the 3rd answer by user1654209. That workaround works flawlessly, but read all the comments to see why it's not favored.

like image 803
B.K. Avatar asked Mar 21 '13 08:03

B.K.


5 Answers

Your struct has almost certainly been padded to preserve the alignment of its content. This means that it will not be 37 bytes, and that mismatch causes the reading to go out of sync. Looking at the way each string is losing 3 characters, it seems that it has been padded to 40 bytes.

As the padding is likely to be between the string and the integers, not even the first record reads correctly.

In this case I would recommend not attempting to read your data as a binary blob, and stick to reading individual fields. It's far more robust, especially if you even want to alter your structure.

like image 80
JasonD Avatar answered Oct 04 '22 13:10

JasonD


Without seeing the code that writes the data, I'm guessing that you write the data the way you read it in the first example, each element one by one. Then each record in the file will indeed be 37 bytes.

However, since the compiler pads structures to put members on nice boundaries for optimization reasons, your structure is 40 bytes. So when you read the complete structure in a single call, then you actually read 40 bytes at a time, which means that your reading will go out of phase with the actual records in the file.

You either have to re-implement the writing to write the complete structure in one go, or use the first method of reading where you're reading one member field at a time.

like image 21
Some programmer dude Avatar answered Oct 04 '22 13:10

Some programmer dude


A simple workaround is to pack your structure to 1 byte

using gcc

struct __attribute__((packed)) Student
{
    char name[25];
    int quiz1;
    int quiz2;
    int quiz3;
};

using msvc

#pragma pack(push, 1) //set padding to 1 byte, saves previous value
struct  Student
{
    char name[25];
    int quiz1;
    int quiz2;
    int quiz3;
};
#pragma pack(pop) //restore previous pack value

EDIT : As user ahans states : pragma pack is supported by gcc since version 2.7.2.3 (released in 1997) so it seems safe to use pragma pack as the only packed notation if you are targetting msvc and gcc

like image 30
user18428 Avatar answered Oct 04 '22 14:10

user18428


As you've already found out, the padding is the issue here. Also, as others have suggested, the proper way of solving this is to read each member individually as you've done in your example. I don't expect this to cost much more than reading the whole thing in once performance-wise. However, if you still want to go ahead and read it as once, you can tell the compiler to do the padding differently:

#pragma pack(push, 1)
struct Student
{
    char name[25];
    int quiz1;
    int quiz2;
    int quiz3;
};
#pragma pack(pop)

With #pragma pack(push, 1) you tell the compiler to save the current pack value on an internal stack and use a pack value of 1 thereafter. This means you get an alignment of 1 byte, which means no padding at all in this case. With #pragma pack(pop) you tell the compiler to get the last value from the stack and use this thereafter, thereby restoring the behavior the compiler used before the definition of your struct.

While #pragma usually indicates non-portable, compiler-dependent features, this one works at least with GCC and Microsoft VC++.

like image 43
ahans Avatar answered Oct 04 '22 13:10

ahans


There is more than one way to solve the problem of this thread. Here is a solution based on using union of a struct and a char buf:

#include <fstream>
#include <sstream>
#include <iomanip>
#include <string>

/*
This is the main idea of the technique: Put the struct
inside a union. And then put a char array that is the
number of chars needed for the array.

union causes sStudent and buf to be at the exact same
place in memory. They overlap each other!
*/
union uStudent
{
    struct sStudent
    {
        char name[25];
        int quiz1;
        int quiz2;
        int quiz3;
    } field;

    char buf[ sizeof(sStudent) ];    // sizeof calcs the number of chars needed
};

void create_data_file(fstream& file, uStudent* oStudent, int idx)
{
    if (idx < 0)
    {
        // index passed beginning of oStudent array. Return to start processing.
        return;
    }

    // have not yet reached idx = -1. Tail recurse
    create_data_file(file, oStudent, idx - 1);

    // write a record
    file.write(oStudent[idx].buf, sizeof(uStudent));

    // return to write another record or to finish
    return;
}


std::string read_in_data_file(std::fstream& file, std::stringstream& strm_buf)
{
    // allocate a buffer of the correct size
    uStudent temp_student;

    // read in to buffer
    file.read( temp_student.buf, sizeof(uStudent) );

    // at end of file?
    if (file.eof())
    {
        // finished
        return strm_buf.str();
    }

    // not at end of file. Stuff buf for display
    strm_buf << std::setw(25) << std::left << temp_student.field.name;
    strm_buf << std::setw(5) << std::right << temp_student.field.quiz1;
    strm_buf << std::setw(5) << std::right << temp_student.field.quiz2;
    strm_buf << std::setw(5) << std::right << temp_student.field.quiz3;
    strm_buf << std::endl;

    // head recurse and see whether at end of file
    return read_in_data_file(file, strm_buf);
}



std::string quiz(void)
{

    /*
    declare and initialize array of uStudent to facilitate
    writing out the data file and then demonstrating
    reading it back in.
    */
    uStudent oStudent[] =
    {
        {"Bart Simpson",          75,   65,   70},
        {"Ralph Wiggum",          35,   60,   44},
        {"Lisa Simpson",         100,   98,   91},
        {"Martin Prince",         99,   98,   99},
        {"Milhouse Van Houten",   80,   87,   79}

    };




    fstream file;

    // ios::trunc causes the file to be created if it does not already exist.
    // ios::trunc also causes the file to be empty if it does already exist.
    file.open("quizzes.dat", ios::in | ios::out | ios::binary | ios::trunc);

    if ( ! file.is_open() )
    {
        ShowMessage( "File did not open" );
        exit(1);
    }


    // create the data file
    int num_elements = sizeof(oStudent) / sizeof(uStudent);
    create_data_file(file, oStudent, num_elements - 1);

    // Don't forget
    file.flush();

    /*
    We wrote actual integers. So, you cannot check the file so
    easily by just using a common text editor such as Windows Notepad.

    You would need an editor that shows hex values or something similar.
    And integrated development invironment (IDE) is likely to have such
    an editor.   Of course, not always so.
    */


    /*
    Now, read the file back in for display. Reading into a string buffer
    for display all at once. Can modify code to display the string buffer
    wherever you want.
    */

    // make sure at beginning of file
    file.seekg(0, ios::beg);

    std::stringstream strm_buf;
    strm_buf.str( read_in_data_file(file, strm_buf) );

    file.close();

    return strm_buf.str();
}

Call quiz() and receive a string formatted for display to std::cout, writing to a file, or whatever.

The main idea is that all the items inside a union start at the same address in memory. So you can have a char or wchar_t buf that is the same size as the struct you want to write to or read from a file. And notice that zero casts are needed. There is not one cast in the code.

I also did not have to worry about padding.

For those who do not like recursion, sorry. Working it out with recursion is easier and less error prone for me. Maybe not easier for others? The recursions can be converted to loops. And they would need to be converted to loops for very large files.

For those who like recursions, this is yet another instance of using recursion.

I don't claim that using union is the best solution or not. Seems that it is a solution. Maybe you like it?

like image 21
Indinfer Avatar answered Oct 04 '22 14:10

Indinfer