How impacting is "Packing" structures on performance

Tags:

performance

It isn't my goal to start micro-optimizing, so if that's what this turns in to, I'll gladly drop the question. But I'm about to start making some design decisions and want to be more informed.

I am reading and processing a file format which contains numerous data structures that are documented in a well defined format. I've represented them in code as structs.

Now, if I pack the structs one 1-byte alignment with #pragma pack(1), I can read the structures off the IO stream directly on to struct pointers. This is convenient. If I don't pack the structures, I can either fread the fields one by one or fread blocks at a time and reinterpret_cast the struct fields one-by-one, which will probably get old fast.

For reference, the structs will be read (potentially) by the thousands and could have some number crunching done on them. They're mostly comprised of unsigned 16 bit integers (about 60%), unsigned 32 bit integers (about 30%) and some 64 bit integers.

So the question at hand is, do I...

Conduct tens of thousands of tiny calls fread?
Read chunks and copy over the relevent bytes?
Pack the structs and read directly on to them?

898

asked Apr 19 '13 16:04

user2285060

2 Answers

Ultimately, the performance difference between solution A and solution B can only be determined by a benchmarking. Asking on the internet will give you variable results that may or may not reflect the reality in your case.

What happens when you "misalign" data is that the processor needs to do multiple reads [and the same applies for writes] for one piece of data. Exactly how much extra time that takes depends on the processor - some processors don't do it automatically, so the runtime system will trap the "bad read" and perform the read in some emulation layer [or, in some processors, simply kill the process for "unaligned memory access"]. Clearly, taking a trap and doing a couple of read operations then returnding to the calling code is a pretty significant impact on performance - it can easily take hundreds of cycles longer than an aligned read operation.

In the case of x86, it "works just like you'd expect", but with a penalty of typically 1 extra clock cycle [assuming data is already in L1 cache]. One clock cycle isn't very much in a modern processor, but if the loop is 10000000000000 iterations long and reads unaligned data n times, you have now added n * 10000000000000 clock-cycles to the execution time, which may be significant.

The other alternatives also have impact on performance. Doing a lot of small reads is likely A LOT slower than doing one large read. A conversions function is LIKELY better from a performance perspective.

Again, please don't take this as a "given", you really need to compare the different solutions (or pick one, and if the performance doesn't suck, and the code isn't horrible looking, leave it at that). I'm fairly convinced you could find cases for every one of the three solutions you suggest being "best".

Also bear in mind that #pragma pack is compiler specific, and it's not easy to achieve macros that allow you to select between the "Microsoft" and "gcc" solution, for example. Edit: it would appear that more recent gcc versions do support this option - but not ALL compilers do.

134

answered Oct 11 '22 09:10

Mats Petersson

Per your comment to another answer, your code intends to be platform agnostic and the endian-ness of the file format is clearly specified. In this case, reading directly into a packed struct loses much of its clarity because it will require an after-read endian-cleanup step or else result in incorrect data on architectures with different endian-ness than the file format.

Assuming that you always know the number of bytes (probably from a struct type indicator in the file) I would suggest using a factory pattern where the created object's constructor knows how to pull bytes out of a memory buffer attribute by attribute (if the file is small enough you can just read the entire thing into a buffer than then do a loop/factory-create/deserialize-into-object-via-constructor. This way you can control the endian-ness and allow the compiler's desired struct alignment.

answered Oct 11 '22 09:10

Mark B

Related questions
                            
                                Open url on an external browser on button click in a Qt application
                            
                                Monitor<T> class implementation in c++11 and c++03?
                            
                                Compile error when trying to compile a qt project
                            
                                How to use a class function in C++?
                            
                                Must be initialized in constructor base/member? [duplicate]
                            
                                imshow function in openCV
                            
                                how to select a code path for different type in a member template function
                            
                                Safely reading string from Lua stack
                            
                                Add an item in a container of smart pointers
                            
                                Is it possible to run unmanaged C++ normally from a managed C++/CLI project?
                            
                                std::move vs. compiler optimization
                            
                                Enumeration types in Node.js native addon
                            
                                templating virtual functions not possible. Only a temporary technical limitation?
                            
                                Why is a single thread faster than multiple threads even though they essentially have the same overhead?
                            
                                Get process ID by name
                            
                                How to get this pointer from std::function?
                            
                                how to build c++ project with scons 2.3 visual express 2012?
                            
                                What is the size of each element in std::list?
                            
                                Derived exception does not inherit constructors
                            
                                #include <iostream> in multiple files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With