Before you mark this as duplicate, please do read the question.
So this may be a potentially very stupid question but it is bothering me. I know, from reading, as well as many other SO questions that fields in a struct in C are not guaranteed to be contiguous due to padding added by the compiler. For example, according to the C standard:
13/ Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
I was working on writing a program similar to the unix readelf
and nm
just for fun and it involves a lot of work with dealing with bytes at specific offsets into the file to read certain values. For example, the first 62 bytes of an object file contains the "file header". The file header's bytes 0x00-0x04 encode an int, while 0x20-0x28 encode a pointer etc. However, I noticed in the original implementation of readelf.c that the programmer does something like this:
First, they declare a struct (lets call it ELF_H) with fields corresponding to the things in the file header (i.e. the first field is an int just like the first 4 bytes in the file header are, the second is a char because bytes 0x04-0x05 in the elf header encode a char etc.). Then what they do is copy the entire elf file to memory and type case the pointer that points to the start of this memory into type ELF_H. Something like:
FILE *file = fopen('filename', rb);
void *start_of_file = malloc(/* size_of_file */);
fread(start_of_file, 1, /* size_of_file */,file); // copies entire file into memory
ELF_H hdr = *(ELF_H) start_of_file; // type case pointer to be of type struct and dereference
and after doing this, just access each section of the header by using the member variables of the struct. So instead of getting what is supposed to be at byte 0x04 using pointer arithmetic, they just do hdr.member2 (which in the struct is the second member followed by the first one which was an int).
How is this meant to work if fields in a struct aren't guaranteed to be contiguous?
The closest answer I could find to this was here but in that example, the members of the struct are of the same type. In the ELF_H, they are of different types.
Thank you in advance :)
If the data in the file was written from a padded struct of the form being read, then the padding is irrelevant; the file contains padding as does the memory representation.
It's true the standard isn't particularly restrictive, and a compiler could insert random padding in the ELF reader struct that the tool that wrote the ELF didn't match. But in practice, the "unnamed padding" is for alignment purposes, and all major compilers have predictable behavior there; they pad to align the fields to match their type. So int
fields (on systems with four byte int
) are preceded by 1-3 pad bytes if the previous field didn't end on a four byte boundary, char
fields get no padding, etc. In this case, no compiler I know of would insert padding between a leading int
field and a following char[2]
, because char
has no required alignment anyway.
It's also possible to use non-standard compiler extensions to prevent padding to align fields in the struct, but it's not necessary if your struct definition would never have an unaligned field anyway (because you always put smaller fields after larger fields, or because you always group smaller fields together to maintain the alignment requirements of subsequent larger fields).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With