Our headers use <code>#pragma pack(1)</code> around most of our structs (used for net and file I/O). I understand that it changes the alignment of structs from the default of 8 bytes, to an alignment of 1 byte. Assuming that everything is run in 32-bit Linux (perhaps Windows too), is there any performance hit that comes from this packing alignment? I'm not concerned about portability for libraries, but more with compatibility of file and network I/O with different #pragma packs, and performance issues.

Memory access is fastest when it can take place at word-aligned memory addresses. The simplest example is the following struct (which @Didier also used): <pre class="prettyprint"><code>struct sample { char a; int b; }; </code></pre> By default, GCC inserts padding, so a is at offset 0, and b is at offset 4 (word-aligned). Without padding, b isn't word-aligned, and access is slower. How much slower? <ul> <li>For 32-bit x86, according to the Intel 64 and IA32 Architectures Software Developer's Manual:<blockquote>The processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.</blockquote>As with most performance questions, you'd have to benchmark your application to see how much of an issue this is in practice.</li> <li>According to Wikipedia, x86 extensions like SSE2 require word alignment.</li> <li>Many other architectures require word alignment (and will generate SIGBUS errors if data structures aren't word-aligned).</li> </ul> Regarding portability: I assume that you're using <code>#pragma pack(1)</code> so that you can send structs across the wire and to and from disk without worrying about different compilers or platforms packing structs differently. This is valid, however, there are a couple of issues to keep in mind: <ul> <li>This does nothing to handle big endian versus little endian issues. You can handle these by calling the htons family of functions on any ints, unsigned, etc. in your structs.</li> <li>In my experience, working with packed, serializable structs in application code isn't a lot of fun. They're very difficult to modify and extend without breaking backwards compatibility, and as already noted, there are performance penalties. Consider transferring your packed, serializable structs' contents into equivalent non-packed, extensible structs for processing, or consider using a full-fledged serialization library like Protocol Buffers (which has C bindings).</li> </ul>

Are there performance issues when using pragma pack(1)?

1 Answers

Memory access is fastest when it can take place at word-aligned memory addresses. The simplest example is the following struct (which @Didier also used):

struct sample {
   char a;
   int b;
};

By default, GCC inserts padding, so a is at offset 0, and b is at offset 4 (word-aligned). Without padding, b isn't word-aligned, and access is slower.

How much slower?

For 32-bit x86, according to the Intel 64 and IA32 Architectures Software Developer's Manual:
The processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.
As with most performance questions, you'd have to benchmark your application to see how much of an issue this is in practice.
According to Wikipedia, x86 extensions like SSE2 require word alignment.
Many other architectures require word alignment (and will generate SIGBUS errors if data structures aren't word-aligned).

Regarding portability: I assume that you're using #pragma pack(1) so that you can send structs across the wire and to and from disk without worrying about different compilers or platforms packing structs differently. This is valid, however, there are a couple of issues to keep in mind:

This does nothing to handle big endian versus little endian issues. You can handle these by calling the htons family of functions on any ints, unsigned, etc. in your structs.
In my experience, working with packed, serializable structs in application code isn't a lot of fun. They're very difficult to modify and extend without breaking backwards compatibility, and as already noted, there are performance penalties. Consider transferring your packed, serializable structs' contents into equivalent non-packed, extensible structs for processing, or consider using a full-fledged serialization library like Protocol Buffers (which has C bindings).

156

answered Oct 21 '22 17:10

Josh Kelley

Related questions
                            
                                How to initialize a const variable inside a struct in C?
                            
                                Sending int[]s between Java and C
                            
                                Iterator in C language
                            
                                Why is C++ compatible with C? [closed]
                            
                                Undefined reference to openssl functions when compiling with gcc in Ubuntu 11.10
                            
                                GCC unable to find header file in a included library
                            
                                expanding macro variable within quoted string
                            
                                using scanf to read a string and an int separated by /
                            
                                Command line arguments, reading a file
                            
                                Are flexible array members really necessary?
                            
                                Find pathname from dlopen handle on OSX
                            
                                How to print the current thread stack trace inside the Linux kernel?
                            
                                undefined reference to `strlwr'
                            
                                How is the 'E/e' in hexadecimal differentiated from the 'E/e' in exponential form in a hexadecimal floating point literal?
                            
                                Why is stat::st_size 0 for devices but at the same time lseek defines the device size correctly?
                            
                                asn.1 parser in C/Python
                            
                                When should you use macros instead of inline functions?
                            
                                Error "No such device" in call setsockopt when joining multicast group
                            
                                pthreads in C - pthread_exit
                            
                                Why child process returns exit status = 32512 in unix?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are there performance issues when using pragma pack(1)?

Tags:

c

gcc

Nicolas

People also ask

1 Answers

Josh Kelley

Recent Activity

Donate For Us