Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C - get type alignment portably

I'm writing really small interpreter for a very simple language, which allows for simple structure definitions (made of other structures and simple types, like int, char, float, double and so one). I want fields to use as little alignment as possible, so using max_align_t or something similar is out of question. Now, I wonder if there is a nicer way to get alignment of any single type other than this:

#include <stdio.h>
#include <stddef.h>

#define GA(type, name) struct GA_##name { char c; type d; }; \
    const unsigned int alignment_for_##name = offsetof(struct GA_##name, d);

GA(int, int);
GA(short, short);
GA(char, char);
GA(float, float);
GA(double, double);
GA(char*, char_ptr);
GA(void*, void_ptr);

#define GP(type, name) printf("alignment of "#name" is: %dn", alignment_for_##name);

int main() {
GP(int, int);
GP(short, short);
GP(char, char);
GP(float, float);
GP(double, double);
GP(char*, char_ptr);
GP(void*, void_ptr);
}

This works, but maybe there is something nicer?

like image 758
Jędrzej Dudkiewicz Avatar asked Feb 11 '15 17:02

Jędrzej Dudkiewicz


2 Answers

There is in C11, which adds _Alignof:

printf("Alignment of int: %zu\n", _Alignof(int));

It is usually better style to include <stdalign.h>, and use the lowercase alignof:

#include <stdalign.h>

printf("Alignment of int: %zu\n", alignof(int));

You can check for C11 this way:

#if __STDC_VERSION__ >= 201112L
    /* C11 */
#else
    /* not C11 */
#endif

If you're using GCC or CLang, you can compile your code in C11 mode by adding -std=c11 (or -std=gnu11 if you also want GNU extensions). Default mode is gnu89 for GCC, and gnu99 for CLang.


Update:

You might not need to check the alignment for your system at all, if you make a few educated guesses. I would recommend using one of these two orderings:

// non-embedded use
long double, long long, void (*)(void), void*, double, long, float, int, short, char

// embedded use (microcontrollers)
long double, long long, double, long, float, void (*)(void), void*, int, short, char

This ordering is perfectly portable (but not always optimal), since the worst-case scenario is just that you get more padding than otherwise.

What follows is an (admittedly lengthy) rationale. Feel free to skip everything past this point if you don't care how I came to this conclusion for the ordering.


Covering most cases

This holds true in C (regardless of implementation):

// for both `signed` and `unsigned`
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
sizeof(float) <= sizeof(double) <= sizeof(long double)

With a bit of tinkering with the order, you should be able to get unpadded structures in most cases. Please note that this does not guarantee that the structure will be unpadded; but it will be in most real-world cases, which should be GoodEnough™.

What follows is advice that is implementation-specific, but should cover alignment in most cases.

This is, however, perfectly portable everywhere (even if not optimal) — as long as you accept that there may be some padding (in other words, don't assume the structure is without any padding). If you get it wrong, all you'll get is a larger structure, so there's no danger of hitting undefined behavior of any kind.

What you should do is order these from largest to smallest, since their alignment will also be in this order. Assuming a typical amd64 compiler:

long long a; // 8-byte
long b;      // 8-byte or 4-byte; already aligned in both cases
int c;       // 4-byte; already aligned
short d, e;  // 2-byte; both already aligned
char f;      // 1-byte; always aligned

Integer types

So let's start figuring out our order, starting with integer types:

long long, long, int, short, char

Floating-point types

Now, the floating-point types. What do you do with double? Its alignment is typically 8 bytes on 64-bit architectures, and 4 bytes on 32-bit (but it can be 8-byte in some cases).

long long is always at least 8-byte (this is implicitly demanded by the standard because of its minimum range), and long is always at least 4-byte (but it's usually 8-byte in 64-bit; there are exceptions such as Windows).

What I would do is put double between those. Note that double can be 4 bytes in size (usually in embedded systems, such as AVR / Arduino), but those practically always have a 4-byte long.

long double is a complex case. Its alignment can range from 4-byte (say, x86 Linux) to 16-byte (amd64 Linux). Nevertheless, 4-byte alignment is a historical artifact and is suboptimal; so I'll presume it's at least 8-byte and put it above long long. This'll also make it optimal when its alignment is 16-byte.

This leaves float, which is practically always a 4-byte quantity, with 4-byte alignment; I'll put it between long, which is guaranteed to be at least 4-byte, and int, which can (typically) be 4 or 2-byte.

All of this combined gives us the next order:

long double, long long, double, long, float, int, short, char

Pointer types

All we have left now are pointer types. Size of different non-function pointers is not necessarily the same, but I am going to assume it is (and it is true in the vast majority of, if not all, cases). I'll assume function pointers can be larger (think hardware architecture with larger ROM than RAM), so I'll put them above others.

Worst-case practical scenario is that they're the same, so I've achieved nothing; best-case is that I've eliminated some more padding.

But what about the size? This typically holds on non-embedded systems:

sizeof(long) <= sizeof(T*) <= sizeof(long long)

In most systems, sizeof(long) and sizeof(T*) are the same; but e.g. 64-bit Windows has 32-bit long, and 64-bit T*. However, in embedded systems, it's different; pointers there can be 16-bit, which means:

sizeof(int) <= sizeof(T*) <= sizeof(long)

What to do here is up to you --- you are the one who knows where this'll usually run. On one hand, optimizing for embedded where primary use is non-, means optimizing for the uncommon case. On the other hand, memory is more limited in embedded systems than not. Personally, I'd recommend optimizing for desktop use, unless you're specifically making an embedded application. Since the alignment of double is typically the same as pointer size but can be larger, I'd put this below double.

// non-embedded
long double, long long, void (*)(void), void*, double, long, float, int, short, char

For embedded uses, I'd put it below float, since alignment of float is usually 4-byte, but T* is 2-byte or 4-byte:

// embedded
long double, long long, double, long, float, void (*)(void), void*, int, short, char
like image 98
Tim Čas Avatar answered Nov 20 '22 14:11

Tim Čas


This is probably not very portable, but GCC accepts the following:

#define alignof(type) offsetof(struct { char c; type d; }, d)

EDIT: And according to this answer, C allows casting to anonymous struct types (although I'd like to see this statement backed up). So the following should be portable:

#define alignof(type) ((size_t)&((struct { char c; type d; } *)0)->d)

Another approach using GNU statement expressions:

#define alignof(type) ({ \
    struct s { char c; type d; }; \
    offsetof(struct s, d); \
})
like image 31
nwellnhof Avatar answered Nov 20 '22 14:11

nwellnhof