Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ struct alignment and STL vectors

Tags:

c++

stl

I have a legacy data structure that's 672 bytes long. These structs are stored in a file, sequentially, and I need to read them in.

While I can read them in one-by-one, it would be nice to do this:

// I know in advance how many structs to read in
vector<MyStruct> bunchOfStructs;
bunchOfStructs.resize(numberOfStructs);

ifstream ifs;
ifs.open("file.dat");
if (ifs) {
    ifs.read(&bunchOfStructs[0], sizeof(MyStruct) * numberOfStructs);
}

This works, but I think it only works because the data structure size happens to be evenly divisible by my compiler's struct alignment padding. I suspect it'll break on another compiler or platform.

The alternative would be to use a for loop to read in each struct one-at-a-time.

The question --> When do I have to be concerned about data alignment? Does dynamically allocated memory in a vector use padding or does STL guarantee that the elements are contiguous?

like image 550
Nate Avatar asked Feb 21 '10 23:02

Nate


2 Answers

The standard requires you to be able to create an array of a struct type. When you do so, the array is required to be contiguous. That means, whatever size is allocated for the struct, it has to be one that allows you to create an array of them. To ensure that, the compiler can allocate extra space inside the structure, but cannot require any extra space between the structs.

The space for the data in a vector is (normally) allocated with ::operator new (via an Allocator class), and ::operator new is required to allocate space that's properly aligned to store any type.

You could supply your own Allocator and/or overload ::operator new -- but if you do, your version is still required to meet the same requirements, so it won't change anything in this respect.

In other words, exactly what you want is required to work as long as the data in the file was created in essentially the same way you're trying to read it back in. If it was created on another machine or with a different compiler (or even the same compiler with different flags) you have a fair number of potential problems -- you might get differences in endianness, padding in the struct, and so on.

Edit: Given that you don't know whether the structs have been written out in the format expected by the compiler, you not only need to read the structs one at a time -- you really need to read the items in the structs one at a time, then put each into a temporary struct, and finally add that filled-in struct to your collection.

Fortunately, you can overload operator>> to automate most of this. It doesn't improve speed (for example), but it can keep your code cleaner:

struct whatever { 
    int x, y, z;
    char stuff[672-3*sizeof(int)];

    friend std::istream &operator>>(std::istream &is, whatever &w) { 
       is >> w.x >> w.y >> w.z;
       return is.read(w.stuff, sizeof(w.stuff);
    } 
};

int main(int argc, char **argv) { 
    std::vector<whatever> data;

    assert(argc>1);

    std::ifstream infile(argv[1]);

    std::copy(std::istream_iterator<whatever>(infile),
              std::istream_iterator<whatever>(),
              std::back_inserter(data));  
    return 0;
}
like image 133
Jerry Coffin Avatar answered Oct 18 '22 04:10

Jerry Coffin


For your existing file, your best bet is to figure out its file format, and to read each type in individually, read in and discard any alignment bytes.

It's best to not make any assumptions with struct alignment.

To save new data to a file, you could use something like boost serialization.

like image 36
Brian R. Bondy Avatar answered Oct 18 '22 03:10

Brian R. Bondy