I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this: <pre class="prettyprint"><code>void sse_func(const float* const ptr, int len){ if( ptr is aligned ) { for( ... ){ // unroll loop by 4 or 2 elements } for( ....){ // handle the rest // (non-optimized code) } } else { for( ....){ // regular C code to handle non-aligned memory } } } </code></pre> However, how do I correctly determine if the memory <code>ptr</code> points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code). Thank you in advance...

<pre class="prettyprint"><code>#define is_aligned(POINTER, BYTE_COUNT) \ (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0) </code></pre> The cast to <code>void *</code> (or, equivalenty, <code>char *</code>) is necessary because the standard only guarantees an invertible conversion to <code>uintptr_t</code> for <code>void *</code>. If you want type safety, consider using an inline function: <pre class="prettyprint"><code>static inline _Bool is_aligned(const void *restrict pointer, size_t byte_count) { return (uintptr_t)pointer % byte_count == 0; } </code></pre> and hope for compiler optimizations if <code>byte_count</code> is a compile-time constant. Why do we need to convert to <code>void *</code> ? The C language allows different representations for different pointer types, eg you could have a 64-bit <code>void *</code> type (the whole address space) and a 32-bit <code>foo *</code> type (a segment). The conversion <code>foo *</code> -> <code>void *</code> might involve an actual computation, eg adding an offset. The standard also leaves it up to the implementation what happens when converting (arbitrary) pointers to integers, but I suspect that it is often implemented as a noop. For such an implementation, <code>foo *</code> -> <code>uintptr_t</code> -> <code>foo *</code> would work, but <code>foo *</code> -> <code>uintptr_t</code> -> <code>void *</code> and <code>void *</code> -> <code>uintptr_t</code> -> <code>foo *</code> wouldn't. The alignment computation would also not work reliably because you only check alignment relative to the segment offset, which might or might not be what you want. In conclusion: Always use <code>void *</code> to get implementation-independant behaviour.

EDIT: casting to <code>long</code> is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays. As pointed out in the comments below, there are better solutions if you are willing to include a header... A pointer <code>p</code> is aligned on a 16-byte boundary iff <code>((unsigned long)p & 15) == 0</code>.

How to determine if memory is aligned?

Tags:

c

optimization

memory

simd

sse

I am new to optimizing code with SSE/SSE2 instructions and until now I have not gotten very far. To my knowledge a common SSE-optimized function would look like this:

void sse_func(const float* const ptr, int len){     if( ptr is aligned )     {         for( ... ){             // unroll loop by 4 or 2 elements         }         for( ....){             // handle the rest             // (non-optimized code)         }     } else {         for( ....){             // regular C code to handle non-aligned memory         }     } }

However, how do I correctly determine if the memory ptr points to is aligned by e.g. 16 Bytes? I think I have to include the regular C code path for non-aligned memory as I cannot make sure that every memory passed to this function will be aligned. And using the intrinsics to load data from unaligned memory into the SSE registers seems to be horrible slow (Even slower than regular C code).

Thank you in advance...

814

asked Dec 13 '09 23:12

user229898

2 Answers

#define is_aligned(POINTER, BYTE_COUNT) \     (((uintptr_t)(const void *)(POINTER)) % (BYTE_COUNT) == 0)

The cast to void * (or, equivalenty, char *) is necessary because the standard only guarantees an invertible conversion to uintptr_t for void *.

If you want type safety, consider using an inline function:

static inline _Bool is_aligned(const void *restrict pointer, size_t byte_count) { return (uintptr_t)pointer % byte_count == 0; }

and hope for compiler optimizations if byte_count is a compile-time constant.

Why do we need to convert to void * ?

The C language allows different representations for different pointer types, eg you could have a 64-bit void * type (the whole address space) and a 32-bit foo * type (a segment).

The conversion foo * -> void * might involve an actual computation, eg adding an offset. The standard also leaves it up to the implementation what happens when converting (arbitrary) pointers to integers, but I suspect that it is often implemented as a noop.

For such an implementation, foo * -> uintptr_t -> foo * would work, but foo * -> uintptr_t -> void * and void * -> uintptr_t -> foo * wouldn't. The alignment computation would also not work reliably because you only check alignment relative to the segment offset, which might or might not be what you want.

In conclusion: Always use void * to get implementation-independant behaviour.

144

answered Oct 16 '22 01:10

Christoph

EDIT: casting to long is a cheap way to protect oneself against the most likely possibility of int and pointers being different sizes nowadays.

As pointed out in the comments below, there are better solutions if you are willing to include a header...

A pointer p is aligned on a 16-byte boundary iff ((unsigned long)p & 15) == 0.

answered Oct 16 '22 01:10

Pascal Cuoq

Related questions
                            
                                Non-blocking call for reading descriptor
                            
                                How are floating point numbers stored in memory?
                            
                                Is it a good idea to compile a language to C?
                            
                                Why doesn't C have binary literals?
                            
                                error: function returns address of local variable
                            
                                What is EOF in the C programming language?
                            
                                C struct initialization using labels. It works, but how?
                            
                                Sleep function in Windows, using C
                            
                                Returning string from C function
                            
                                Iterate through a C array
                            
                                Directly assigning values to C Pointers
                            
                                Why can int _$[:>=<%-!.0,}; compile?
                            
                                bitwise not operator
                            
                                Declare variables at top of function or in separate scopes?
                            
                                getch and arrow codes
                            
                                start gdb using a pid
                            
                                In Clion's debugger, how do I show the entire contents of an int array
                            
                                How to do unsigned saturating addition in C?
                            
                                Passing array to a function (and why it does not work in C++)
                            
                                What is the official name of C++'s arrow (->) operator?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With