Apologies in advance for what may be a silly first post on well-trodden ground. While there is plenty of material on the subject, very little of it is definitive and/or intelligible to me. I have an <code>AlignedArray</code> template class to dynamically allocate memory on the heap with arbitrary alignment (I need 32-byte alignment for AVX assembly routines). This requires some ugly pointer manipulation. Agner Fog provides a sample class in cppexamples.zip that abuses a union to do so (http://www.agner.org/optimize/optimization_manuals.zip). However, I know that writing to one member of a union and then reading from another results in UB. AFAICT it is safe to alias any pointer type to a <code>char *</code>, but only in one direction. This is where my understanding gets fuzzy. Here's an abridged version of my <code>AlignedArray</code> class (essentially a rewrite of Agner's, to help my understanding): <pre class="prettyprint"><code>template <typename T, size_t alignment = 32> class AlignedArray { size_t m_size; char * m_unaligned; T * m_aligned; public: AlignedArray (size_t const size) : m_size(0) , m_unaligned(0) , m_aligned(0) { this->size(size); } ~AlignedArray () { this->size(0); } T const & operator [] (size_t const i) const { return m_aligned[i]; } T & operator [] (size_t const i) { return m_aligned[i]; } size_t const size () { return m_size; } void size (size_t const size) { if (size > 0) { if (size != m_size) { char * unaligned = 0; unaligned = new char [size * sizeof(T) + alignment - 1]; if (unaligned) { // Agner: /* union { char * c; T * t; size_t s; } aligned; aligned.c = unaligned + alignment - 1; aligned.s &= ~(alignment - 1); */ // Me: T * aligned = reinterpret_cast<T *>((reinterpret_cast<size_t>(unaligned) + alignment - 1) & ~(alignment - 1)); if (m_unaligned) { // Agner: //memcpy(aligned.c, m_aligned, std::min(size, m_size)); // Me: memcpy(aligned, m_aligned, std::min(size, m_size)); delete [] m_unaligned; } m_size = size; m_unaligned = unaligned; // Agner: //m_aligned = aligned.t; // Me: m_aligned = aligned; } return; } return; } if (m_unaligned) { delete [] m_unaligned; m_size = 0; m_unaligned = 0; m_aligned = 0; } } }; </code></pre> So which method is safe(r)?

I have code that implements the (replacement) <code>new</code> and <code>delete</code> operators, suitable for SIMD (i.e., SSE / AVX). It uses the following functions that you might find useful: <pre class="prettyprint"><code>static inline void *G0__SIMD_malloc (size_t size) { constexpr size_t align = G0_SIMD_ALIGN; void *ptr, *uptr; static_assert(G0_SIMD_ALIGN >= sizeof(void *), "insufficient alignment for pointer storage"); static_assert((G0_SIMD_ALIGN & (G0_SIMD_ALIGN - 1)) == 0, "G0_SIMD_ALIGN value must be a power of (2)"); size += align; // raw pointer storage with alignment padding. if ((uptr = malloc(size)) == nullptr) return nullptr; // size_t addr = reinterpret_cast<size_t>(uptr); uintptr_t addr = reinterpret_cast<uintptr_t>(uptr); ptr = reinterpret_cast<void *> ((addr + align) & ~(align - 1)); *(reinterpret_cast<void **>(ptr) - 1) = uptr; // (raw ptr) return ptr; } static inline void G0__SIMD_free (void *ptr) { if (ptr != nullptr) free(*(reinterpret_cast<void **>(ptr) - 1)); // (raw ptr) } </code></pre> This should be easy to adapt. Obviously you would replace <code>malloc</code> and <code>free</code>, since you're using the global <code>new</code> and <code>delete</code> for raw (char) storage. It assumes that <code>size_t</code> is sufficiently wide for address arithmetic - true in practice, but <code>uintptr_t</code> from <code><cstdint></code> would be more correct.

To answer your question, both of those methods are just as safe. The only two operations that are really stinky there are the cast to <code>size_t</code> and <code>new char[stuff]</code>. You should at least be using <code>uintptr_t</code> from <code><cstdint></code> for the first. The second operation creates your only pointer aliasing issue as technically the <code>char</code> constructor is run on each <code>char</code> element and that constitutes accessing the data through the <code>char</code> pointer. You should use <code>malloc</code> instead. The other supposed 'pointer aliasing' isn't an issue. And that's because other than the <code>new</code> operation you aren't accessing any data through the aliased pointers. You are only accessing data through the <code>T *</code> you get after alignment. Of course, you have to remember to construct all of your array elements. This is true even in your version. Who knows what kind of <code>T</code> people will put there. And, of course, if you do that, you'll have to remember to call their destructors, and have to remember to handle exceptions when you copy them (<code>memcpy</code> doesn't cut it). If you have a particular C++11 feature, you do not need to do this. C++11 has a function specifically for aligning pointers to arbitrary boundaries. The interface is a little funky, but it should do the job. The call is <code>::std::align</code> defined in <code><memory></code>.Thanks to R. Martinho Fernandes for pointing it out. Here is a version of your function with the suggested fixed: <pre class="prettyprint"><code>#include <cstdint> // For uintptr_t #include <cstdlib> // For malloc #include <algorithm> template <typename T, size_t alignment = 32> class AlignedArray { size_t m_size; void * m_unaligned; T * m_aligned; public: AlignedArray (size_t const size) : m_size(0) , m_unaligned(0) , m_aligned(0) { this->size(size); } ~AlignedArray () { this->size(0); } T const & operator [] (size_t const i) const { return m_aligned[i]; } T & operator [] (size_t const i) { return m_aligned[i]; } size_t size() const { return m_size; } void size (size_t const size) { using ::std::uintptr_t; using ::std::malloc; if (size > 0) { if (size != m_size) { void * unaligned = 0; unaligned = malloc(size * sizeof(T) + alignment - 1); if (unaligned) { T * aligned = reinterpret_cast<T *>((reinterpret_cast<uintptr_t>(unaligned) + alignment - 1) & ~(alignment - 1)); if (m_unaligned) { ::std::size_t constructed = 0; const ::std::size_t num_to_copy = ::std::min(size, m_size); try { for (constructed = 0; constructed < num_to_copy; ++constructed) { new(aligned + constructed) T(m_aligned[constructed]); } for (; constructed < size; ++constructed) { new(aligned + constructed) T; } } catch (...) { for (::std::size_t i = 0; i < constructed; ++i) { aligned[i].T::~T(); } ::std::free(unaligned); throw; } for (size_t i = 0; i < m_size; ++i) { m_aligned[i].T::~T(); } free(m_unaligned); } m_size = size; m_unaligned = unaligned; m_aligned = aligned; } } } else if (m_unaligned) { // and size <= 0 for (::std::size_t i = 0; i < m_size; ++i) { m_aligned[i].T::~T(); } ::std::free(m_unaligned); m_size = 0; m_unaligned = 0; m_aligned = 0; } } }; </code></pre>

C++: Strict aliasing vs union abuse

Tags:

c++

pointers

unions

memory-alignment

Apologies in advance for what may be a silly first post on well-trodden ground. While there is plenty of material on the subject, very little of it is definitive and/or intelligible to me.

I have an AlignedArray template class to dynamically allocate memory on the heap with arbitrary alignment (I need 32-byte alignment for AVX assembly routines). This requires some ugly pointer manipulation.

Agner Fog provides a sample class in cppexamples.zip that abuses a union to do so (http://www.agner.org/optimize/optimization_manuals.zip). However, I know that writing to one member of a union and then reading from another results in UB.

AFAICT it is safe to alias any pointer type to a char *, but only in one direction. This is where my understanding gets fuzzy. Here's an abridged version of my AlignedArray class (essentially a rewrite of Agner's, to help my understanding):

template <typename T, size_t alignment = 32>
class AlignedArray
{
    size_t m_size;
    char * m_unaligned;
    T * m_aligned;

public:
    AlignedArray (size_t const size)
        : m_size(0)
        , m_unaligned(0)
        , m_aligned(0)
    {
        this->size(size);
    }

    ~AlignedArray ()
    {
        this->size(0);
    }

    T const & operator [] (size_t const i) const { return m_aligned[i]; }

    T & operator [] (size_t const i) { return m_aligned[i]; }

    size_t const size () { return m_size; }

    void size (size_t const size)
    {
        if (size > 0)
        {
            if (size != m_size)
            {
                char * unaligned = 0;
                unaligned = new char [size * sizeof(T) + alignment - 1];
                if (unaligned)
                {
                    // Agner:
                    /*
                    union {
                        char * c;
                        T * t;
                        size_t s;
                    } aligned;
                    aligned.c = unaligned + alignment - 1;
                    aligned.s &= ~(alignment - 1);
                    */

                    // Me:
                    T * aligned = reinterpret_cast<T *>((reinterpret_cast<size_t>(unaligned) + alignment - 1) & ~(alignment - 1));

                    if (m_unaligned)
                    {
                        // Agner:
                        //memcpy(aligned.c, m_aligned, std::min(size, m_size));

                        // Me:
                        memcpy(aligned, m_aligned, std::min(size, m_size));

                        delete [] m_unaligned;
                    }
                    m_size = size;
                    m_unaligned = unaligned;

                    // Agner:
                    //m_aligned = aligned.t;

                    // Me:
                    m_aligned = aligned;
                }
                return;
            }
            return;
        }
        if (m_unaligned)
        {
            delete [] m_unaligned;
            m_size = 0;
            m_unaligned = 0;
            m_aligned = 0;
        }
    }
};

So which method is safe(r)?

240

asked Mar 07 '13 15:03

linguamachina

2 Answers

I have code that implements the (replacement) new and delete operators, suitable for SIMD (i.e., SSE / AVX). It uses the following functions that you might find useful:

static inline void *G0__SIMD_malloc (size_t size)
{
    constexpr size_t align = G0_SIMD_ALIGN;
    void *ptr, *uptr;

    static_assert(G0_SIMD_ALIGN >= sizeof(void *),
                  "insufficient alignment for pointer storage");

    static_assert((G0_SIMD_ALIGN & (G0_SIMD_ALIGN - 1)) == 0,
                  "G0_SIMD_ALIGN value must be a power of (2)");

    size += align; // raw pointer storage with alignment padding.

    if ((uptr = malloc(size)) == nullptr)
        return nullptr;

    // size_t addr = reinterpret_cast<size_t>(uptr);
    uintptr_t addr = reinterpret_cast<uintptr_t>(uptr);

    ptr = reinterpret_cast<void *>
        ((addr + align) & ~(align - 1));

    *(reinterpret_cast<void **>(ptr) - 1) = uptr; // (raw ptr)

    return ptr;
}


static inline void G0__SIMD_free (void *ptr)
{
    if (ptr != nullptr)
        free(*(reinterpret_cast<void **>(ptr) - 1)); // (raw ptr)
}

This should be easy to adapt. Obviously you would replace malloc and free, since you're using the global new and delete for raw (char) storage. It assumes that size_t is sufficiently wide for address arithmetic - true in practice, but uintptr_t from <cstdint> would be more correct.

141

answered Oct 19 '22 20:10

Brett Hale

To answer your question, both of those methods are just as safe. The only two operations that are really stinky there are the cast to size_t and new char[stuff]. You should at least be using uintptr_t from <cstdint> for the first. The second operation creates your only pointer aliasing issue as technically the char constructor is run on each char element and that constitutes accessing the data through the char pointer. You should use malloc instead.

The other supposed 'pointer aliasing' isn't an issue. And that's because other than the new operation you aren't accessing any data through the aliased pointers. You are only accessing data through the T * you get after alignment.

Of course, you have to remember to construct all of your array elements. This is true even in your version. Who knows what kind of T people will put there. And, of course, if you do that, you'll have to remember to call their destructors, and have to remember to handle exceptions when you copy them (memcpy doesn't cut it).

If you have a particular C++11 feature, you do not need to do this. C++11 has a function specifically for aligning pointers to arbitrary boundaries. The interface is a little funky, but it should do the job. The call is ::std::align defined in <memory>.Thanks to R. Martinho Fernandes for pointing it out.

Here is a version of your function with the suggested fixed:

#include <cstdint>  // For uintptr_t
#include <cstdlib>  // For malloc
#include <algorithm>

template <typename T, size_t alignment = 32>
class AlignedArray
{
    size_t m_size;
    void * m_unaligned;
    T * m_aligned;

public:
    AlignedArray (size_t const size)
        : m_size(0)
        , m_unaligned(0)
        , m_aligned(0)
    {
        this->size(size);
    }

    ~AlignedArray ()
    {
        this->size(0);
    }

    T const & operator [] (size_t const i) const { return m_aligned[i]; }

    T & operator [] (size_t const i) { return m_aligned[i]; }

    size_t size() const { return m_size; }

    void size (size_t const size)
    {
        using ::std::uintptr_t;
        using ::std::malloc;

        if (size > 0)
        {
            if (size != m_size)
            {
                void * unaligned = 0;
                unaligned = malloc(size * sizeof(T) + alignment - 1);
                if (unaligned)
                {
                    T * aligned = reinterpret_cast<T *>((reinterpret_cast<uintptr_t>(unaligned) + alignment - 1) & ~(alignment - 1));

                    if (m_unaligned)
                    {
                        ::std::size_t constructed = 0;
                        const ::std::size_t num_to_copy = ::std::min(size, m_size);

                        try {
                            for (constructed = 0; constructed < num_to_copy; ++constructed) {
                                new(aligned + constructed) T(m_aligned[constructed]);
                            }
                            for (; constructed < size; ++constructed) {
                                new(aligned + constructed) T;
                            }
                        } catch (...) {
                            for (::std::size_t i = 0; i < constructed; ++i) {
                                aligned[i].T::~T();
                            }
                            ::std::free(unaligned);
                            throw;
                        }

                        for (size_t i = 0; i < m_size; ++i) {
                            m_aligned[i].T::~T();
                        }
                        free(m_unaligned);
                    }
                    m_size = size;
                    m_unaligned = unaligned;
                    m_aligned = aligned;
                }
            }
        } else if (m_unaligned) { // and size <= 0
            for (::std::size_t i = 0; i < m_size; ++i) {
                m_aligned[i].T::~T();
            }
            ::std::free(m_unaligned);
            m_size = 0;
            m_unaligned = 0;
            m_aligned = 0;
        }
    }
};

answered Oct 19 '22 20:10

Omnifarious

Related questions
                            
                                Is there a C++ allocator that respects an overridden new/delete?
                            
                                Implicit conversion to explicit bool-types for sorting containers?
                            
                                Excluding std::pair constructors that use explicit construction
                            
                                Implementing a multiple producer/consumer lockless queue in C\C++
                            
                                printing all binary trees from inorder traversal
                            
                                Are there any frameworks to synchronize data generated on one peer with all other peers in an unreliable network?
                            
                                How to disable return value optimization in Visual Studio 2010?
                            
                                Virtual Inheritance and dreaded diamond
                            
                                How to insert a duplicate element into a vector?
                            
                                c++: OpenMP and non-random-access STL containers - a possible workaround
                            
                                Why does Glibmm/Gtkmm not include the unary dereferencing operator, *, for Glib::RefPtr?
                            
                                Adding MFC support to a Qt project
                            
                                What is the difference between a "container" and a "data structure"?
                            
                                Modifying package gbm of R
                            
                                Overloading std::function argument to match lambda [duplicate]
                            
                                How to perform template substitution in the clang library?
                            
                                Forcing a constant expression to be evaluated during compile-time?
                            
                                "Looking At" an object with a Quaternion
                            
                                More efficient and fast way of inverting matrices in c++ (big and small)
                            
                                In which namespace do hash<T> functors for user types belong?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With