I am building a class hierarchy that uses SSE intrinsics functions and thus some of the members of the class need to be 16-byte aligned. For stack instances I can use <code>__declspec(align(#))</code>, like so: <pre class="prettyprint"><code>typedef __declspec(align(16)) float Vector[4]; class MyClass{ ... private: Vector v; }; </code></pre> Now, since <code>__declspec(align(#))</code> is a compilation directive, the following code may result in an unaligned instance of Vector on the heap: <pre class="prettyprint"><code>MyClass *myclass = new MyClass; </code></pre> This too, I know I can easily solve by overloading the new and delete operators to use <code>_aligned_malloc</code> and <code>_aligned_free</code> accordingly. Like so: <pre class="prettyprint"><code>//inside MyClass: public: void* operator new (size_t size) throw (std::bad_alloc){ void * p = _aligned_malloc(size, 16); if (p == 0) throw std::bad_alloc() return p; } void operator delete (void *p){ MyClass* pc = static_cast<MyClass*>(p); _aligned_free(p); } ... </code></pre> So far so good.. but here is my problem. Consider the following code: <pre class="prettyprint"><code>class NotMyClass{ //Not my code, which I have little or no influence over ... MyClass myclass; ... }; int main(){ ... NotMyClass *nmc = new NotMyClass; ... } </code></pre> Since the myclass instance of <code>MyClass</code> is created statically on a dynamic instance of NotMyClass, myclass WILL be 16-byte aligned relatively to the beginning of nmc because of Vector's <code>__declspec(align(16))</code> directive. But this is worthless, since nmc is dynamically allocated on the heap with NotMyClass's new operator, which doesn't nesessarily ensure (and definitely probably NOT) 16-byte alignment. So far, I can only think of 2 approaches on how to deal with this problem: <ol> <li> Preventing MyClass users from being able to compile the following code: <pre class="prettyprint"><code>MyClass myclass; </code></pre> meaning, instances of MyClass can only be created dynamically, using the new operator, thus ensuring that all instances of MyClass are truly dynamically allocatted with MyClass's overloaded new. I have consulted on another thread on how to accomplish this and got a few great answers: C++, preventing class instance from being created on the stack (during compiltaion) </li> <li>Revert from having Vector members in my Class and only have pointers to Vector as members, which I will allocate and deallocate using <code>_aligned_malloc</code> and <code>_aligned_free</code> in the ctor and dtor respectively. This methos seems crude and prone to error, since I am not the only programmer writing these Classes (MyClass derives from a Base class and many of these classes use SSE).</li> </ol> However, since both solutions have been frowned upon in my team, I come to you for suggestions of a different solution.

If you're set against heap allocation, another idea is to over allocate on the stack and manually align (manual alignment is discussed in this SO post). The idea is to allocate byte data (<code>unsigned char</code>) with a size guaranteed to contain an aligned region of the necessary size (<code>+15</code>), then find the aligned position by rounding down from the most-shifted region (<code>x+15 - (x+15) % 16</code>, or <code>x+15 & ~0x0F</code>). I posted a working example of this approach with vector operations on codepad (for <code>g++ -O2 -msse2</code>). Here are the important bits: <pre class="prettyprint"><code>class MyClass{ ... unsigned char dPtr[sizeof(float)*4+15]; //over-allocated data float* vPtr; //float ptr to be aligned public: MyClass(void) : vPtr( reinterpret_cast<float*>( (reinterpret_cast<uintptr_t>(dPtr)+15) & ~ 0x0F ) ) {} ... }; ... </code></pre> The constructor ensures that vPtr is aligned (note the order of members in the class declaration is important). This approach works (heap/stack allocation of containing classes is irrelevant to alignment), is portabl-ish (I think most compilers provide a pointer sized uint <code>uintptr_t</code>), and will not leak memory. But its not particularly safe (being sure to keep the aligned pointer valid under copy, etc), wastes (nearly) as much memory as it uses, and some may find the reinterpret_casts distasteful. The risks of aligned operation/unaligned data problems could be mostly eliminated by encapsulating this logic in a Vector object, thereby controlling access to the aligned pointer and ensuring that it gets aligned at construction and stays valid.

preventing unaligned data on the heap

Tags:

c++

alignment

visual-c++

I am building a class hierarchy that uses SSE intrinsics functions and thus some of the members of the class need to be 16-byte aligned. For stack instances I can use __declspec(align(#)), like so:

typedef __declspec(align(16)) float Vector[4];
class MyClass{
...
private:
Vector v;
};

Now, since __declspec(align(#)) is a compilation directive, the following code may result in an unaligned instance of Vector on the heap:

MyClass *myclass = new MyClass;

This too, I know I can easily solve by overloading the new and delete operators to use _aligned_malloc and _aligned_free accordingly. Like so:

//inside MyClass:
public:
void* operator new (size_t size) throw (std::bad_alloc){
    void * p = _aligned_malloc(size, 16);
    if (p == 0)  throw std::bad_alloc()
    return p; 
}

void operator delete (void *p){
    MyClass* pc = static_cast<MyClass*>(p); 
    _aligned_free(p);
}
...

So far so good.. but here is my problem. Consider the following code:

class NotMyClass{ //Not my code, which I have little or no influence over
...
MyClass myclass;
...
};
int main(){
    ...
    NotMyClass *nmc = new NotMyClass;
    ...
}

Since the myclass instance of MyClass is created statically on a dynamic instance of NotMyClass, myclass WILL be 16-byte aligned relatively to the beginning of nmc because of Vector's __declspec(align(16)) directive. But this is worthless, since nmc is dynamically allocated on the heap with NotMyClass's new operator, which doesn't nesessarily ensure (and definitely probably NOT) 16-byte alignment.

So far, I can only think of 2 approaches on how to deal with this problem:

Preventing MyClass users from being able to compile the following code:
```
MyClass myclass;
```
meaning, instances of MyClass can only be created dynamically, using the new operator, thus ensuring that all instances of MyClass are truly dynamically allocatted with MyClass's overloaded new. I have consulted on another thread on how to accomplish this and got a few great answers: C++, preventing class instance from being created on the stack (during compiltaion)
Revert from having Vector members in my Class and only have pointers to Vector as members, which I will allocate and deallocate using _aligned_malloc and _aligned_free in the ctor and dtor respectively. This methos seems crude and prone to error, since I am not the only programmer writing these Classes (MyClass derives from a Base class and many of these classes use SSE).

However, since both solutions have been frowned upon in my team, I come to you for suggestions of a different solution.

372

asked Jun 22 '10 18:06

eladidan

2 Answers

If you're set against heap allocation, another idea is to over allocate on the stack and manually align (manual alignment is discussed in this SO post). The idea is to allocate byte data (unsigned char) with a size guaranteed to contain an aligned region of the necessary size (+15), then find the aligned position by rounding down from the most-shifted region (x+15 - (x+15) % 16, or x+15 & ~0x0F). I posted a working example of this approach with vector operations on codepad (for g++ -O2 -msse2). Here are the important bits:

class MyClass{
   ...
   unsigned char dPtr[sizeof(float)*4+15]; //over-allocated data
   float* vPtr;                            //float ptr to be aligned

   public:
      MyClass(void) : 
         vPtr( reinterpret_cast<float*>( 
            (reinterpret_cast<uintptr_t>(dPtr)+15) & ~ 0x0F
         ) ) 
      {}
   ...
};
...

The constructor ensures that vPtr is aligned (note the order of members in the class declaration is important).

This approach works (heap/stack allocation of containing classes is irrelevant to alignment), is portabl-ish (I think most compilers provide a pointer sized uint uintptr_t), and will not leak memory. But its not particularly safe (being sure to keep the aligned pointer valid under copy, etc), wastes (nearly) as much memory as it uses, and some may find the reinterpret_casts distasteful.

The risks of aligned operation/unaligned data problems could be mostly eliminated by encapsulating this logic in a Vector object, thereby controlling access to the aligned pointer and ensuring that it gets aligned at construction and stays valid.

188

answered Oct 07 '22 20:10

academicRobot

You can use "placement new."

void* operator new(size_t, void* p) { return p; }

int main() {
    void* p = aligned_alloc(sizeof(NotMyClass));
    NotMyClass* nmc = new (p) NotMyClass;
    // ...

    nmc->~NotMyClass();
    aligned_free(p);
}

Of course you need to take care when destroying the object, by calling the destructor and then releasing the space. You can't just call delete. You could use shared_ptr<> with a different function to deal with that automatically; it depends if the overhead of dealing with a shared_ptr (or other wrapper of the pointer) is a problem to you.

answered Oct 07 '22 21:10

janm

Related questions
                            
                                Why is this friend method not found as expected?
                            
                                Is the transformation of fetch_add(0, memory_order_relaxed/release) to mfence + mov legal?
                            
                                Why does Clang generate different code for reference and non-null pointer arguments?
                            
                                Why is there not a no-throw guarantee in the standard for std::set::extract() and std::set::insert(nh)?
                            
                                Using std::atomic with futex system call
                            
                                Selection of inherited operator contrary to `using` clause in C++
                            
                                A more generic visitor pattern
                            
                                Adding unit tests to an existing project
                            
                                Open Source sound engine
                            
                                Fluent interfaces and inheritance in C++
                            
                                How to layout the code of a simple game? [closed]
                            
                                using QTextStream to read stdin in a non-blocking fashion
                            
                                When debugging on Windows where does stderr go?
                            
                                How can I include a subset of a .cpp file in a Doxygen comment?
                            
                                Boost: what exactly is not threadsafe in Boost.Signals?
                            
                                Visualization from C/C++ via Gnuplot's pipe interface
                            
                                Plugin DLLs that depend on other DLLs
                            
                                Architecture for Qt SIGNAL with subclass-specific, templated argument type
                            
                                Visual C++ 2008: Finding the cause of slow link times
                            
                                How to read verbose VC++ linker output

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With