It isn't my goal to start micro-optimizing, so if that's what this turns in to, I'll gladly drop the question. But I'm about to start making some design decisions and want to be more informed.
I am reading and processing a file format which contains numerous data structures that are documented in a well defined format. I've represented them in code as structs.
Now, if I pack the structs one 1-byte alignment with #pragma pack(1)
, I can read the structures off the IO stream directly on to struct pointers. This is convenient. If I don't pack the structures, I can either fread
the fields one by one or fread
blocks at a time and reinterpret_cast
the struct fields one-by-one, which will probably get old fast.
For reference, the structs will be read (potentially) by the thousands and could have some number crunching done on them. They're mostly comprised of unsigned 16 bit integers (about 60%), unsigned 32 bit integers (about 30%) and some 64 bit integers.
So the question at hand is, do I...
fread
?The term "closest packed structures" refers to the most tightly packed or space-efficient composition of crystal structures (lattices). Imagine an atom in a crystal lattice as a sphere. While cubes may easily be stacked to fill up all empty space, unfilled space will always exist in the packing of spheres.
Structured packing is formed from corrugated sheets of perforated embossed metal, plastic (including PTFE) or wire gauze. The result is a very open honeycomb structure with inclined flow channels giving a relatively high surface area but with very low resistance to gas flow.
padding makes things bigger. packing makes things smaller.
In Structure, sometimes the size of the structure is more than the size of all structures members because of structure padding. Note: But what actual size of all structure member is 13 Bytes. So here total 3 bytes are wasted. So, to avoid structure padding we can use pragma pack as well as an attribute.
Ultimately, the performance difference between solution A and solution B can only be determined by a benchmarking. Asking on the internet will give you variable results that may or may not reflect the reality in your case.
What happens when you "misalign" data is that the processor needs to do multiple reads [and the same applies for writes] for one piece of data. Exactly how much extra time that takes depends on the processor - some processors don't do it automatically, so the runtime system will trap the "bad read" and perform the read in some emulation layer [or, in some processors, simply kill the process for "unaligned memory access"]. Clearly, taking a trap and doing a couple of read operations then returnding to the calling code is a pretty significant impact on performance - it can easily take hundreds of cycles longer than an aligned read operation.
In the case of x86, it "works just like you'd expect", but with a penalty of typically 1 extra clock cycle [assuming data is already in L1 cache]. One clock cycle isn't very much in a modern processor, but if the loop is 10000000000000 iterations long and reads unaligned data n times, you have now added n * 10000000000000 clock-cycles to the execution time, which may be significant.
The other alternatives also have impact on performance. Doing a lot of small reads is likely A LOT slower than doing one large read. A conversions function is LIKELY better from a performance perspective.
Again, please don't take this as a "given", you really need to compare the different solutions (or pick one, and if the performance doesn't suck, and the code isn't horrible looking, leave it at that). I'm fairly convinced you could find cases for every one of the three solutions you suggest being "best".
Also bear in mind that #pragma pack
is compiler specific, and it's not easy to achieve macros that allow you to select between the "Microsoft" and "gcc" solution, for example. Edit: it would appear that more recent gcc versions do support this option - but not ALL compilers do.
Per your comment to another answer, your code intends to be platform agnostic and the endian-ness of the file format is clearly specified. In this case, reading directly into a packed struct
loses much of its clarity because it will require an after-read endian-cleanup step or else result in incorrect data on architectures with different endian-ness than the file format.
Assuming that you always know the number of bytes (probably from a struct type indicator in the file) I would suggest using a factory pattern where the created object's constructor knows how to pull bytes out of a memory buffer attribute by attribute (if the file is small enough you can just read the entire thing into a buffer than then do a loop/factory-create/deserialize-into-object-via-constructor. This way you can control the endian-ness and allow the compiler's desired struct alignment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With