Atomic int64_t on ARM Cortex M3

Question

Since my compiler still doesn't support c++11 and std::atomic, I'm forced to implement it manually via ldrex-strex pair.

My question is: what is the correct way to 'atomically' read-modify-write int64_t with ldrex and strex?

Simple solution like this doesn't seem to work (one of STREXW returns 1 all the time):

volatile int64_t value;
int64_t temp;

do
{
    int32_t low = __LDREXW( (uint32_t *)&value );
    int32_t high = __LDREXW( ((uint32_t *)&value)+1 );

    temp = (int64_t)low | ( (int64_t)high<<32);
    temp++;    

} while( __STREXW( temp, (uint32_t *)&value) |  __STREXW( temp>>32, ((uint32_t *)&value)+1) );

I couldn't find anything about several sequential LDREX or STREX instructions pointing to different addresses in the manual but it seemed to me that it should be allowed.

Otherwise multiple threads would not be able to change two different atomic variables in some scenarios.

Notlikethat · Accepted Answer

This will never work, because you cannot nest exclusives that way. Implementation-wise, the Cortex-M3 local exclusive monitor doesn't even keep track of an address - the exclusive reservation granule is the entire address space - so the assumption of tracking each word separately is already invalid. However, you don't even need to consider any implementation details, because the architecture already explicitly rules out the back-to-back strex:

If two STREX instructions are executed without an intervening LDREX the second STREX returns a status value of 1. This means that:

Every STREX must have a preceding LDREX associated with it in a given thread of execution.

It is not necessary for every LDREX to have a subsequent STREX .

Since Cortex-M3 (and ARMv7-M in general) doesn't have ldrexd like ARMv7-A, you'll either have to use a separate lock to control all accesses to the variable, or just disable interrupts around the read-modify-write. If at all possible it would really be better to redesign things not to need an atomic 64-bit type in the first place, since you'd still only achieve atomicity with respect to other threads on the same core - you simply cannot make any 64-bit operation atomic from the point of view of an external agent like a DMA controller.

Peter Cordes · Answer

I'd just look at how gcc does it, and use the same instruction sequences.

gcc 4.8.2 claims to implement std::atomic<int64_t> with is_lock_free() returning true, even with -mcpu=cortex-m3. Unfortunately, it doesn't really work. It makes code that doesn't link or doesn't work, because there is no implementation of the helper functions it tries to use. (Thanks @Notlikethat for trying it out.)

Here's the test code I tried. See an old version of this answer if that link is dead. I'm leaving this answer around in case the idea is useful for anyone in related cases where gcc does make useful code.

Atomic int64_t on ARM Cortex M3

Tags:

assembly

atomic

arm

Amomum

2 Answers

Notlikethat

Peter Cordes

Recent Activity

Donate For Us

Atomic int64_t on ARM Cortex M3

Tags:

assembly

atomic

arm

Amomum

2 Answers

Notlikethat

Peter Cordes

Related questions

Recent Activity

Donate For Us