Edit: ST does not allow to post more than two links for newbies. Sorry for the missing references.
I'm trying to reduce locking overhead in a C application where detecting changes on a global state is performance relevant. Even though I've been reading quite a lot on the topic lately (e.g. a lot from H. Sutter, and many more) I fail to be confident about my implementation. I would like to use a combination of a CAS like operation and DCL for a check on a Cache-Line Aligned global variable, thus avoiding false-sharing, to update thread local data from data shared among multiple threads. My lack of confidence is mainly due to
My questions:
The Type-Attributes documentation states:
This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations:
(please see Type-Attributes documentation for declaration)
force the compiler to insure (as far as it can) that each variable whose type is
struct S
ormore_aligned_int
will be allocated and aligned at least on a8-byte
boundary. On a SPARC, having all variables of typestruct S
aligned to8-byte
boundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.
Does that mean that the beginning of struct S
or more_aligned_int
will always be aligned to 8-byte
boundary? It does not mean the data will be padded to use exactly 64 bytes, right?
Assuming 1. is true that every instance of struct cache_line_aligned
(see code Example 1 below) aligns on 64-byte
boundaries and utilize exactly one cache-line (assuming cache-lines are 64 bytes
in length)
Using typedef
for the type declaration does not alter the semantics of __attribute__ ((aligned (64)))
(see code Example 2 below)
I do not need to use aligned_malloc
when instantiating the struct if struct is declared with __attribute__ ...
// Example 1
struct cache_line_aligned {
int version;
char padding[60];
} __attribute__ ((aligned (64)));
// Example 2
typedef struct {
int version;
// place '__attribute__ ((aligned (64)))' after 'int version'
// or at the end of the declaration
char padding[60];
} cache_line_aligned2 __attribute__ ((aligned (64)));
And finally a sketch of a function that uses the cache-line aligned approach to efficiently check if global state has been modified by some other thread:
void lazy_update_if_changed(int &t_version, char *t_data) {
// Assuming 'g_cache_line_aligned' is an instance of
// 'struct cache_line_aligned' or 'struct cache_line_aligned2'
// and variables prefixed with 't_' being thread local
if(g_cache_line_aligned.version == t_version) {
// do nothing and return
} else {
// enter critical section (acquire lock e.g. with pthread_mutex_lock)
t_version = g_cache_line_aligned.version
// read other data that requires locking where changes are notified
// by modifying 'g_cache_line_aligned.version', e.g. t_data
// leave critical section
}
}
Sorry for the long post.
Thank you!
When you define an aligned type, say, aligned to 8-byte boundaries, the compiler should make the type a multiple of the alignment (here, a multiple of 8 bytes) in size by padding.
The rationale for that is simple. Suppose you want to define an array of that aligned type. Naturally, every element of it should be aligned as well. That's why there may be padding.
Here's a little demonstration:
#include <stdio.h>
struct cache_line_aligned {
int version;
// char padding[60];
} __attribute__ ((aligned (64)));
int main(void)
{
struct cache_line_aligned s;
struct cache_line_aligned a[2];
printf("sizeof(struct cache_line_aligned) = %d\n", (int)sizeof(struct cache_line_aligned));
printf("sizeof(s) = %d\n", (int)sizeof(s));
printf("sizeof(a[0]) = %d\n", (int)sizeof(a[0]));
printf("sizeof(a) = %d\n", (int)sizeof(a));
return 0;
}
Output (ideone):
sizeof(struct cache_line_aligned) = 64
sizeof(s) = 64
sizeof(a[0]) = 64
sizeof(a) = 128
If you create an instance of struct cache_line_aligned
non-dynamically (IOW, not via malloc()
and such), just like in the above code, it will be aligned.
The C standard (from 1999) states for malloc()
, calloc()
and realloc()
:
The pointer returned if the allocation succeeds is suitably aligned so that
it may be assigned to a pointer to any type of object and then used to
access such an object or an array of such objects in the space allocated
(until the space is explicitly deallocated).
Where any type of object
does not include artificially aligned/padded types like the above struct because there isn't anything like __attribute__ ((aligned (64)))
in the C standard. This is a GNU extension here. For dynamically allocated objects with arbitrary alignment you have to use the appropriate memory allocation function or do the alignment manually (by allocating more memory and then "aligning" the pointer value).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With