G++ SSE memory alignment on the stack

Tags:

I am attempting to re-write a raytracer using Streaming SIMD Extensions. My original raytracer used inline assembly and movups instructions to load data into the xmm registers. I have read that compiler intrinsics are not significantly slower than inline assembly (I suspect I may even gain speed by avoiding unaligned memory accesses), and much more portable, so I am attempting to migrate my SSE code to use the intrinsics in xmmintrin.h. The primary class affected is vector, which looks something like this:

#include "xmmintrin.h"
union vector {
    __m128 simd;
    float raw[4];
    //some constructors
    //a bunch of functions and operators
} __attribute__ ((aligned (16)));

I have read previously that the g++ compiler will automatically allocate structs along memory boundaries equal to that of the size of the largest member variable, but this does not seem to be occurring, and the aligned attribute isn't helping. My research indicates that this is likely because I am allocating a whole bunch of function-local vectors on the stack, and that alignment on the stack is not guaranteed in x86. Is there any way to force this alignment? I should mention that this is running under native x86 Linux on a 32-bit machine, not Cygwin. I intend to implement multithreading in this application further down the line, so declaring the offending vector instances to be static isn't an option. I'm willing to increase the size of my vector data structure, if needed.

404

asked Feb 11 '11 05:02

Octavianus

2 Answers

The simplest way is std::aligned_storage, which takes alignment as a second parameter.

If you don't have it yet, you might want to check Boost's version.

Then you can build your union:

union vector {
  __m128 simd;
  std::aligned_storage<16, 16> alignment_only;
}

Finally, if it does not work, you can always create your own little class:

template <typename Type, intptr_t Align> // Align must be a power of 2
class RawStorage
{
public:
  Type* operator->() {
    return reinterpret_cast<Type const*>(aligned());
  }

  Type const* operator->() const {
    return reinterpret_cast<Type const*>(aligned());
  }

  Type& operator*() { return *(operator->()); }
  Type const& operator*() const { return *(operator->()); }

private:
  unsigned char* aligned() {
    if (data & ~(Align-1) == data) { return data; }
    return (data + Align) & ~(Align-1);
  }

  unsigned char data[sizeof(Type) + Align - 1];
};

It will allocate a bit more storage than necessary, but this way alignment is guaranteed.

int main(int argc, char* argv[])
{
  RawStorage<__m128, 16> simd;
  *simd = /* ... */;

  return 0;
}

With luck, the compiler might be able to optimize away the pointer alignment stuff if it detects the alignment is necessary right.

125

answered Sep 21 '22 20:09

Matthieu M.

A few weeks ago, I had re-written an old ray tracing assignment from my university days, updating it to run it on 64-bit linux and to make use of the SIMD instructions. (The old version incidentally ran under DOS on a 486, to give you an idea of when I last did anything with it).

There very well may be better ways of doing it, but here is what I did ...

typedef float    v4f_t __attribute__((vector_size (16)));

class Vector {
    ...
    union {
        v4f_t     simd;
        float     f[4];
    } __attribute__ ((aligned (16)));

    ...
};

Disassembling my compiled binary showed that it was indeed making use of the movaps instruction.

Hope this helps.

answered Sep 21 '22 20:09

Sparky

Related questions
                            
                                Does std::shared_mutex favor writers over readers?
                            
                                How do I disable a gcc warning which has no command line switch?
                            
                                Template specialization vs. Function overloading
                            
                                Why doesn't adding sqrt() cause a conflict in C++? [duplicate]
                            
                                Can C++ tuple element types be conditionally added based on template parameters?
                            
                                Java Developer meets Objective-C on Mac OS
                            
                                Different ways of exiting a process in C++
                            
                                How to "watch" a C++ dynamic array using gdb?
                            
                                g++ and c++0x specification support
                            
                                Why is QHBoxLayout causing widgets to overlap?
                            
                                How much memory should 'managed_shared_memory' allocate? (boost)
                            
                                Namespace using declaration (bug in GCC/VS2010)?
                            
                                design class aggregation - stack allocation vs dynamic memory allocation
                            
                                Which x86 C++ compilers are multithreaded by itself?
                            
                                How can I ensure that UnhookWindowsHookEx is called even upon abnormal termination?
                            
                                How to get string with pattern from std::regex in VC++ 2010
                            
                                Light weight C++ SAX XML parser
                            
                                C++ Logging and performance tuning library
                            
                                Send email with attachment using client's email application
                            
                                How to properly interrupt a QThread infinite loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

G++ SSE memory alignment on the stack

Tags:

c++

memory-management

alignment

assembly

sse

Octavianus

People also ask

2 Answers

Matthieu M.

Sparky

Recent Activity

Donate For Us