I have written the followin atomic template with a view to mimicing the atomic operations which will be available in the upcoming c++0x standard. However, I am not sure that the __sync_synchronize() call I have around the returning of the underlying value are necessary. From my understanding, __sync_synchronize() is a full memory barrier and I'm not sure I need such a costly call when returning the object value. I'm pretty sure it'll be needed around the setting of the value but I could also implement this with the assembly .. <pre class="prettyprint"><code>__asm__ __volatile__ ( "rep;nop": : :"memory" ); </code></pre> Does anyone know wether I definitely need the synchronize() on return of the object. M. <pre class="prettyprint"><code>template < typename T > struct atomic { private: volatile T obj; public: atomic( const T & t ) : obj( t ) { } inline operator T() { __sync_synchronize(); // Not sure this is overkill return obj; } inline atomic< T > & operator=( T val ) { __sync_synchronize(); // Not sure if this is overkill obj = val; return *this; } inline T operator++() { return __sync_add_and_fetch( &obj, (T)1 ); } inline T operator++( int ) { return __sync_fetch_and_add( &obj, (T)1 ); } inline T operator+=( T val ) { return __sync_add_and_fetch( &obj, val ); } inline T operator--() { return __sync_sub_and_fetch( &obj, (T)1 ); } inline T operator--( int ) { return __sync_fetch_and_sub( &obj, (T)1 ); } inline T operator-=( T ) { return __sync_sub_and_fetch( &obj, val ); } // Perform an atomic CAS operation // returning the value before the operation inline T exchange( T oldVal, T newVal ) { return __sync_val_compare_and_swap( &obj, oldval, newval ); } }; </code></pre> Update: I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations.

<h3>First, some petty remarks:</h3> <pre class="prettyprint"><code>volatile T obj; </code></pre> volatile is useless here, even more that you make all the barriers yourself. <pre class="prettyprint"><code>inline T operator++( int ) </code></pre> inline is unneeded, since it is implied when the method is defined inside the class. <h3>Getters and setters:</h3> <pre class="prettyprint"><code>inline operator T() { __sync_synchronize(); // (I) T tmp=obj; __sync_synchronize(); // (II) return tmp; } inline atomic< T > & operator=( T val ) { __sync_synchronize(); // (III) obj = val; __sync_synchronize(); // (IV) return *this; } </code></pre> To assure total ordering of the memory accesses on read and write, you need two barriers on each access (like this). I would be happy with only barriers (II) and (III) as they suffice for some uses I came up with (eg. pointer/boolean saying data is there, spinlock), but, unless specified otherwise, I would not omit the others, because someone might need them (it would be nice if someone showed you can omit some of the barriers without restricting possible uses, but I don't think it's possible). Of course, this would be unnecessarily complicated and slow. That said, I would just dump the barriers, and even the idea of using the barriers in any place of a similar template. Note that: <ul> <li>the ordering semantics of that interface is all defined by you; and if you decide the interface has the barriers here or there, they must be here or there, period. If you don't define it, you can come up with more efficient design, because not all barriers, or even not full barriers, might be needed for a particular problem.</li> <li>usually, you use atomics if you have a lock-free algorithm that could give you a performance advantage; this means an interface that prematurely pessimizes the accesses will probably be unusable as a building block of it, as it will hamper the performance itself.</li> <li>lock-free algorithms typically contain communication that cannot be encapsulated by one atomic data type, so you need to know what's happening in the algorithm to place the barriers precisely where they belong (eg. when implementing a lock, you need a barrier after you've acquired it, but before you release it, which are both writes, at least in principle)</li> <li>if you don't wanna have problems, and are not sure about placing the barriers explicitly in the algorithm, just use lock-based algorithms. There's nothing bad about it.</li> </ul> BTW, the c++0x interface allows you to specify precise memory ordering constraints.

C++0x atomic implementation in c++98 question about __sync_synchronize()

Tags:

c++

c++11

templates

atomic

I have written the followin atomic template with a view to mimicing the atomic operations which will be available in the upcoming c++0x standard.

However, I am not sure that the __sync_synchronize() call I have around the returning of the underlying value are necessary.

From my understanding, __sync_synchronize() is a full memory barrier and I'm not sure I need such a costly call when returning the object value.

I'm pretty sure it'll be needed around the setting of the value but I could also implement this with the assembly ..

__asm__ __volatile__ ( "rep;nop": : :"memory" );

Does anyone know wether I definitely need the synchronize() on return of the object.

template < typename T >
struct atomic
{
private:
    volatile T obj;

public:
    atomic( const T & t ) :
        obj( t )
    {
    }

    inline operator T()
    {
        __sync_synchronize();   // Not sure this is overkill
        return obj;
    }

    inline atomic< T > & operator=( T val )
    {
        __sync_synchronize();   // Not sure if this is overkill
        obj = val;
        return *this;
    }

    inline T operator++()
    {
        return __sync_add_and_fetch( &obj, (T)1 );
    }

    inline T operator++( int )
    {
        return __sync_fetch_and_add( &obj, (T)1 );
    }

    inline T operator+=( T val )
    {
        return __sync_add_and_fetch( &obj, val );
    }

    inline T operator--()
    {
        return __sync_sub_and_fetch( &obj, (T)1 );
    }

    inline T operator--( int )
    {
        return __sync_fetch_and_sub( &obj, (T)1 );
    }

    inline T operator-=( T )
    {
        return __sync_sub_and_fetch( &obj, val );
    }

    // Perform an atomic CAS operation
    // returning the value before the operation
    inline T exchange( T oldVal, T newVal )
    {
        return __sync_val_compare_and_swap( &obj, oldval, newval );
    }

};

Update: I want to make sure that the operations are consistent in the face of read/write re-ordering due to compiler optimisations.

979

asked Mar 11 '10 08:03

ScaryAardvark

1 Answers

First, some petty remarks:

volatile T obj;

volatile is useless here, even more that you make all the barriers yourself.

inline T operator++( int )

inline is unneeded, since it is implied when the method is defined inside the class.

Getters and setters:

inline operator T()
{
    __sync_synchronize();   // (I)
    T tmp=obj;
    __sync_synchronize();   // (II)
    return tmp;
}

inline atomic< T > & operator=( T val )
{
    __sync_synchronize();   // (III)
    obj = val;
    __sync_synchronize();   // (IV)
    return *this;
}

To assure total ordering of the memory accesses on read and write, you need two barriers on each access (like this). I would be happy with only barriers (II) and (III) as they suffice for some uses I came up with (eg. pointer/boolean saying data is there, spinlock), but, unless specified otherwise, I would not omit the others, because someone might need them (it would be nice if someone showed you can omit some of the barriers without restricting possible uses, but I don't think it's possible).

Of course, this would be unnecessarily complicated and slow.

That said, I would just dump the barriers, and even the idea of using the barriers in any place of a similar template. Note that:

the ordering semantics of that interface is all defined by you; and if you decide the interface has the barriers here or there, they must be here or there, period. If you don't define it, you can come up with more efficient design, because not all barriers, or even not full barriers, might be needed for a particular problem.
usually, you use atomics if you have a lock-free algorithm that could give you a performance advantage; this means an interface that prematurely pessimizes the accesses will probably be unusable as a building block of it, as it will hamper the performance itself.
lock-free algorithms typically contain communication that cannot be encapsulated by one atomic data type, so you need to know what's happening in the algorithm to place the barriers precisely where they belong (eg. when implementing a lock, you need a barrier after you've acquired it, but before you release it, which are both writes, at least in principle)
if you don't wanna have problems, and are not sure about placing the barriers explicitly in the algorithm, just use lock-based algorithms. There's nothing bad about it.

BTW, the c++0x interface allows you to specify precise memory ordering constraints.

168

answered Oct 16 '22 06:10

jpalecek

Related questions
                            
                                Is there a way to suppress the fmt range formatter for a user defined class?
                            
                                Partial ordering rules of template parameter pack in C++17
                            
                                How to calculate the area of two circles' intersection?
                            
                                Removing elements marked for removal with Ranges-V3
                            
                                What causes failure to unwind in a DWARF perf call stack?
                            
                                Template argument deduction for aggregate template with array
                            
                                How does including gtest.h break template argument deduction for a std algorithm?
                            
                                How are memory_order_seq_cst fences useful anymore in C++20?
                            
                                In C++, how to detect that file has been already opened by own process?
                            
                                Building and running C++ unit tests in Visual Studio (TDD)
                            
                                Inspecting STL containers in Xcode
                            
                                HOT(Heap On Top) Queues
                            
                                Small, portable web browser library?
                            
                                which embedded web server to use for my app GUI [closed]
                            
                                Debugging Visual Studio builds from Eclipse
                            
                                Overhead of a Memory Barrier / Fence
                            
                                Boost.MPL and type list generation
                            
                                Using a.vim for C++
                            
                                Ways to accidentally create temporary objects in C++?
                            
                                Parameters with and without arguments in boost::program_options

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With