Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lock-free check for modification of a global shared state in C using Cache-Line alignment

Edit: ST does not allow to post more than two links for newbies. Sorry for the missing references.

I'm trying to reduce locking overhead in a C application where detecting changes on a global state is performance relevant. Even though I've been reading quite a lot on the topic lately (e.g. a lot from H. Sutter, and many more) I fail to be confident about my implementation. I would like to use a combination of a CAS like operation and DCL for a check on a Cache-Line Aligned global variable, thus avoiding false-sharing, to update thread local data from data shared among multiple threads. My lack of confidence is mainly due to

  1. me failing to interpret the GNU documentation on Type-Attributes
  2. I seem not being able to find any literature and examples that I could easily translate to C, such as aligning-to-cache-line-and-knowing-the-cache-line-size on ST or 1 (although 1 seems to answer my question somewhat I'm not confident with my implementation)
  3. my experience with C is limited

My questions:

  1. The Type-Attributes documentation states:

    This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations:

    (please see Type-Attributes documentation for declaration)

    force the compiler to insure (as far as it can) that each variable whose type is struct S or more_aligned_int will be allocated and aligned at least on a 8-byte boundary. On a SPARC, having all variables of type struct S aligned to 8-byte boundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.

    Does that mean that the beginning of struct S or more_aligned_int will always be aligned to 8-byte boundary? It does not mean the data will be padded to use exactly 64 bytes, right?

  2. Assuming 1. is true that every instance of struct cache_line_aligned (see code Example 1 below) aligns on 64-byte boundaries and utilize exactly one cache-line (assuming cache-lines are 64 bytes in length)

  3. Using typedef for the type declaration does not alter the semantics of __attribute__ ((aligned (64))) (see code Example 2 below)

  4. I do not need to use aligned_malloc when instantiating the struct if struct is declared with __attribute__ ...

// Example 1
struct cache_line_aligned {
  int version;
  char padding[60];
} __attribute__ ((aligned (64)));

// Example 2
typedef struct {
  int version;  
  // place '__attribute__ ((aligned (64)))' after 'int version'
  // or at the end of the declaration 
  char padding[60];
} cache_line_aligned2 __attribute__ ((aligned (64)));

And finally a sketch of a function that uses the cache-line aligned approach to efficiently check if global state has been modified by some other thread:

void lazy_update_if_changed(int &t_version, char *t_data) {
  // Assuming 'g_cache_line_aligned' is an instance of 
  // 'struct cache_line_aligned' or 'struct cache_line_aligned2' 
  // and variables prefixed with 't_' being thread local 
  if(g_cache_line_aligned.version == t_version) {
    // do nothing and return
  } else {
    // enter critical section (acquire lock e.g. with pthread_mutex_lock) 
    t_version = g_cache_line_aligned.version
    // read other data that requires locking where changes are notified 
    // by modifying 'g_cache_line_aligned.version', e.g. t_data
    // leave critical section
  }
} 

Sorry for the long post.

Thank you!

like image 721
instilled Avatar asked Sep 25 '12 23:09

instilled


1 Answers

When you define an aligned type, say, aligned to 8-byte boundaries, the compiler should make the type a multiple of the alignment (here, a multiple of 8 bytes) in size by padding.

The rationale for that is simple. Suppose you want to define an array of that aligned type. Naturally, every element of it should be aligned as well. That's why there may be padding.

Here's a little demonstration:

#include <stdio.h>

struct cache_line_aligned {
  int version;
//  char padding[60];
} __attribute__ ((aligned (64)));

int main(void)
{
  struct cache_line_aligned s;
  struct cache_line_aligned a[2];
  printf("sizeof(struct cache_line_aligned) = %d\n", (int)sizeof(struct cache_line_aligned));
  printf("sizeof(s) = %d\n", (int)sizeof(s));
  printf("sizeof(a[0]) = %d\n", (int)sizeof(a[0]));
  printf("sizeof(a) = %d\n", (int)sizeof(a));
  return 0;
}

Output (ideone):

sizeof(struct cache_line_aligned) = 64
sizeof(s) = 64
sizeof(a[0]) = 64
sizeof(a) = 128

If you create an instance of struct cache_line_aligned non-dynamically (IOW, not via malloc() and such), just like in the above code, it will be aligned.

The C standard (from 1999) states for malloc(), calloc() and realloc():

The pointer returned if the allocation succeeds is suitably aligned so that
it may be assigned to a pointer to any type of object and then used to
access such an object or an array of such objects in the space allocated
(until the space is explicitly deallocated).

Where any type of object does not include artificially aligned/padded types like the above struct because there isn't anything like __attribute__ ((aligned (64))) in the C standard. This is a GNU extension here. For dynamically allocated objects with arbitrary alignment you have to use the appropriate memory allocation function or do the alignment manually (by allocating more memory and then "aligning" the pointer value).

like image 196
Alexey Frunze Avatar answered Oct 19 '22 14:10

Alexey Frunze