Is there a standard macro to detect architectures requiring aligned memory access?

Assuming something like:

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
  unsigned int i;
  for(i=0; i<len; i++)
  {
     dest[i] = src[i] & mask[i];
  }
}

I can go faster on a non-aligned access machine (e.g. x86) by writing something like:

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
  unsigned int i;
  unsigned int wordlen = len >> 2;
  for(i=0; i<wordlen; i++)
  {
    ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; // this raises SIGBUS on SPARC and other archs that require aligned access.
  }
  for(i=wordlen<<2; i<len; i++){
    dest[i] = src[i] & mask[i];
  }
}

However it needs to build on several architectures so I would like to do something like:

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
  unsigned int i;
  unsigned int wordlen = len >> 2;

#if defined(__ALIGNED2__) || defined(__ALIGNED4__) || defined(__ALIGNED8__)
  // go slow
  for(i=0; i<len; i++)
  {
     dest[i] = src[i] & mask[i];
  }
#else
  // go fast
  for(i=0; i<wordlen; i++)
  {
    // the following line will raise SIGBUS on SPARC and other archs that require aligned access.
    ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; 
  }
  for(i=wordlen<<2; i<len; i++){
    dest[i] = src[i] & mask[i];
  }
#endif
}

But I cannot find any good information on compiler defined macros (like my hypothetical __ALIGNED4__ above) that specify alignment or any clever ways of using the pre-processor to determine target architecture alignment. I could just test defined (__SVR4) && defined (__sun), but I would prefer something that will Just Work^{^_TM} on other architectures requiring aligned memory accesses.

What is aligned memory access?

A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.

Does ARM support unaligned access?

By default, ARM7 and ARM9 based microcontrollers do not allow un-aligned accesses to 16-bit and 32-bit data types. Cortex-M3 supports even un-aligned accesses, so the program above would behave correctly.

Why is memory alignment needed?

The CPU can operate on an aligned word of memory atomically, meaning that no other instruction can interrupt that operation. This is critical to the correct operation of many lock-free data structures and other concurrency paradigms.

What is unaligned address?

An unaligned address is then an address that isn't a multiple of the transfer size. The meaning in AXI4 would be the same.

While x86 silently fixes up unaligned accesses, this is hardly optimal for performance. It is usually best to assume a certain alignment and perform fixups yourself:

unsigned int const alignment = 8;   /* or 16, or sizeof(long) */

void memcpy(char *dst, char const *src, unsigned int size) {
    if((((intptr_t)dst) % alignment) != (((intptr_t)src) % alignment)) {
        /* no common alignment, copy as bytes or shift around */
    } else {
        if(((intptr_t)dst) % alignment) {
            /* copy bytes at the beginning */
        }
        /* copy words in the middle */
        if(((intptr_t)dst + size) % alignment) {
            /* copy bytes at the end */
        }
    }
}

Also, take a look at SIMD instructions.

The standard approach would be to have a configure script that runs a program to test for alignment issues. If the test program doesn't crash, the configure script defines a macro in a generated config header that allows for the faster implementation. The safer implementation is the default.

void mask_bytes(unsigned char* dest, unsigned char* src, unsigned char* mask, unsigned int len)
{
  unsigned int i;
  unsigned int wordlen = len >> 2;

#if defined(UNALIGNED)
  // go fast
  for(i=0; i<wordlen; i++)
  {
    // the following line will raise SIGBUS on SPARC and other archs that require aligned access.
    ((uint32_t*)dest)[i] = ((uint32_t*)src)[i] & ((uint32_t*)mask)[i]; 
  }
  for(i=wordlen<<2; i<len; i++){
    dest[i] = src[i] & mask[i];
  }
#else
  // go slow
  for(i=0; i<len; i++)
  {
     dest[i] = src[i] & mask[i];
  }
#endif
}

(I find it weird that you have src and mask when really these commute. I renamed mask_bytes to memand. But anyways...)

Another options is to use different functions that take advantage of types in C. For instance:

void memand_bytes(char *dest, char *src1, char *src2, size_t len)
{
    unsigned int i;
    for (i = 0; i < len; i++)
        dest[i] = src1[i] & src2[i];
}

void memand_ints(int *dest, int *src1, int *src2, size_t len)
{
    unsigned int i;
    for (i = 0; i < len; i++)
        dest[i] = src1[i] & src2[i];
}

This way you let the programmer decide.

Is there a standard macro to detect architectures requiring aligned memory access?

Tags:

c

c-preprocessor

memory-alignment

nolandda

People also ask

3 Answers

Simon Richter

outis

Robert Martin

Recent Activity

Donate For Us

Is there a standard macro to detect architectures requiring aligned memory access?

Tags:

c

c-preprocessor

memory-alignment

nolandda

People also ask

3 Answers

Simon Richter

outis

Robert Martin

Related questions

Recent Activity

Donate For Us