Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a standard macro to detect architectures requiring aligned memory access?

Assuming something like:

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
  unsigned int i;
  for(i=0; i<len; i++)
  {
     dest[i] = src[i] & mask[i];
  }
}

I can go faster on a non-aligned access machine (e.g. x86) by writing something like:

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
  unsigned int i;
  unsigned int wordlen = len >> 2;
  for(i=0; i<wordlen; i++)
  {
    ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; // this raises SIGBUS on SPARC and other archs that require aligned access.
  }
  for(i=wordlen<<2; i<len; i++){
    dest[i] = src[i] & mask[i];
  }
}

However it needs to build on several architectures so I would like to do something like:

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
  unsigned int i;
  unsigned int wordlen = len >> 2;

#if defined(__ALIGNED2__) || defined(__ALIGNED4__) || defined(__ALIGNED8__)
  // go slow
  for(i=0; i<len; i++)
  {
     dest[i] = src[i] & mask[i];
  }
#else
  // go fast
  for(i=0; i<wordlen; i++)
  {
    // the following line will raise SIGBUS on SPARC and other archs that require aligned access.
    ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; 
  }
  for(i=wordlen<<2; i<len; i++){
    dest[i] = src[i] & mask[i];
  }
#endif
}

But I cannot find any good information on compiler defined macros (like my hypothetical __ALIGNED4__ above) that specify alignment or any clever ways of using the pre-processor to determine target architecture alignment. I could just test defined (__SVR4) && defined (__sun), but I would prefer something that will Just WorkTM on other architectures requiring aligned memory accesses.

like image 224
nolandda Avatar asked Dec 07 '11 15:12

nolandda


People also ask

What is aligned memory access?

A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.

Does ARM support unaligned access?

By default, ARM7 and ARM9 based microcontrollers do not allow un-aligned accesses to 16-bit and 32-bit data types. Cortex-M3 supports even un-aligned accesses, so the program above would behave correctly.

Why is memory alignment needed?

The CPU can operate on an aligned word of memory atomically, meaning that no other instruction can interrupt that operation. This is critical to the correct operation of many lock-free data structures and other concurrency paradigms.

What is unaligned address?

An unaligned address is then an address that isn't a multiple of the transfer size. The meaning in AXI4 would be the same.


3 Answers

While x86 silently fixes up unaligned accesses, this is hardly optimal for performance. It is usually best to assume a certain alignment and perform fixups yourself:

unsigned int const alignment = 8;   /* or 16, or sizeof(long) */

void memcpy(char *dst, char const *src, unsigned int size) {
    if((((intptr_t)dst) % alignment) != (((intptr_t)src) % alignment)) {
        /* no common alignment, copy as bytes or shift around */
    } else {
        if(((intptr_t)dst) % alignment) {
            /* copy bytes at the beginning */
        }
        /* copy words in the middle */
        if(((intptr_t)dst + size) % alignment) {
            /* copy bytes at the end */
        }
    }
}

Also, take a look at SIMD instructions.

like image 90
Simon Richter Avatar answered Sep 30 '22 16:09

Simon Richter


The standard approach would be to have a configure script that runs a program to test for alignment issues. If the test program doesn't crash, the configure script defines a macro in a generated config header that allows for the faster implementation. The safer implementation is the default.

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
  unsigned int i;
  unsigned int wordlen = len >> 2;

#if defined(UNALIGNED)
  // go fast
  for(i=0; i<wordlen; i++)
  {
    // the following line will raise SIGBUS on SPARC and other archs that require aligned access.
    ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; 
  }
  for(i=wordlen<<2; i<len; i++){
    dest[i] = src[i] & mask[i];
  }
#else
  // go slow
  for(i=0; i<len; i++)
  {
     dest[i] = src[i] & mask[i];
  }
#endif
}
like image 37
outis Avatar answered Sep 30 '22 18:09

outis


(I find it weird that you have src and mask when really these commute. I renamed mask_bytes to memand. But anyways...)

Another options is to use different functions that take advantage of types in C. For instance:

void memand_bytes(char *dest, char *src1, char *src2, size_t len)
{
    unsigned int i;
    for (i = 0; i < len; i++)
        dest[i] = src1[i] & src2[i];
}

void memand_ints(int *dest, int *src1, int *src2, size_t len)
{
    unsigned int i;
    for (i = 0; i < len; i++)
        dest[i] = src1[i] & src2[i];
}

This way you let the programmer decide.

like image 36
Robert Martin Avatar answered Sep 30 '22 17:09

Robert Martin