Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What, if any, are the alignment requirements for the atomic intrinsic functions?

Atomic operations for the Delphi mobile targets are built on top of the AtomicXXX family of intrinsic functions. The documentation says:

Because the Delphi mobile compilers do not support a built-in assembler, the System unit provides four atomic intrinsic functions that provide a way to atomically exchange, compare and exchange, increment, and decrement memory values.

These four functions are:

  • AtomicIncrement
  • AtomicDecrement
  • AtomicCmpExchange
  • AtomicExchange

Other RTL functions that provide atomic operations, e.g. the static class methods of the TInterlocked class, are built on top of these four intrinsics.

For the mobile compilers that target ARMv7, are there any alignment requirements for these four atomic intrinsics? If so, what are they?

The documentation does not list any such requirements. However, the documentation has been known to be inaccurate and I am not confident to take the absence of any stated requirements as definitive proof that no such requirements exist.

As a mild aside, the XE8 documentation for intrinsic functions states that these atomic intrinsics are not supported by the desktop compilers. That is not correct – these intrinsics are supported by the desktop compilers.

like image 248
Sean B. Durkin Avatar asked Aug 24 '15 02:08

Sean B. Durkin


2 Answers

XE8 compiles

var 
  a: integer;

AtomicIncrement(a);

to

3e: 2201        movs    r2, #1
40: 900c        str r0, [sp, #48]   ; 0x30
42: 910b        str r1, [sp, #44]   ; 0x2c
44: 920a        str r2, [sp, #40]   ; 0x28
46: 980b        ldr r0, [sp, #44]   ; 0x2c
48: e850 1f00   ldrex   r1, [r0]
4c: 9a0a        ldr r2, [sp, #40]   ; 0x28
4e: 4411        add r1, r2
50: e840 1300   strex   r3, r1, [r0]
54: 2b00        cmp r3, #0
56: d1f6        bne.n   46 <_NativeMain+0x46>

So the atomicity is implemented using the ldrex/strex.

If I'm interpreting information at community.arm.com correctly, required alignment is DWORD-aligned for 4-byte operations (ldrd/strd) and QWORD-aligned for 8-byte operations.

Other atomic functions are implemented in a similar way so the same requirements should apply.

AtomicDecrement(a);

68: 980f        ldr r0, [sp, #60]   ; 0x3c
6a: e850 1f00   ldrex   r1, [r0]
6e: 9a0e        ldr r2, [sp, #56]   ; 0x38
70: 1a89        subs    r1, r1, r2
72: e840 1300   strex   r3, r1, [r0]
76: 2b00        cmp r3, #0
78: d1f6        bne.n   68 <_NativeMain+0x68>

AtomicExchange(a,b);

82: 990f        ldr r1, [sp, #60]   ; 0x3c
84: 6008        str r0, [r1, #0]
86: 4873        ldr r0, [pc, #460]  ; (254 <_NativeMain+0x254>)
88: 9a10        ldr r2, [sp, #64]   ; 0x40
8a: 5880        ldr r0, [r0, r2]
8c: 6800        ldr r0, [r0, #0]
8e: f3bf 8f5b   dmb ish
92: 900d        str r0, [sp, #52]   ; 0x34
94: 980f        ldr r0, [sp, #60]   ; 0x3c
96: e850 1f00   ldrex   r1, [r0]
9a: 9b0d        ldr r3, [sp, #52]   ; 0x34
9c: e840 3200   strex   r2, r3, [r0]
a0: 2a00        cmp r2, #0
a2: 910c        str r1, [sp, #48]   ; 0x30
a4: d1f6        bne.n   94 <_NativeMain+0x94>

AtomicCmpExchange(a, 42, 17);

ae: 990f        ldr r1, [sp, #60]   ; 0x3c
b0: 6008        str r0, [r1, #0]
b2: f3bf 8f5b   dmb ish
b6: 202a        movs    r0, #42 ; 0x2a
b8: 2211        movs    r2, #17
ba: 900b        str r0, [sp, #44]   ; 0x2c
bc: 920a        str r2, [sp, #40]   ; 0x28
be: 980f        ldr r0, [sp, #60]   ; 0x3c
c0: e850 1f00   ldrex   r1, [r0]
c4: 9a0a        ldr r2, [sp, #40]   ; 0x28
c6: 4291        cmp r1, r2
c8: d105        bne.n   d6 <_NativeMain+0xd6>
ca: 990b        ldr r1, [sp, #44]   ; 0x2c
cc: 9a0f        ldr r2, [sp, #60]   ; 0x3c
ce: e842 1000   strex   r0, r1, [r2]
d2: 2800        cmp r0, #0
d4: d1f3        bne.n   be <_NativeMain+0xbe>
like image 148
gabr Avatar answered Nov 15 '22 07:11

gabr


Atomicity is usually implemented using LDREX and STREX (Load Exclusive / Store Exclusive instructions). These instructions use a concept called exclusive monitors. Check out: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/ch01s02s01.html Look for 'Exclusives Reservation Granule'

So your alignment requirements is implementation specific and will be decided by the exclusive monitor mechanism implemented on your hardware. I would suggest you take a look at the CPU/SoC documentation for exclusive monitor section.

Eg. When internal monitors are used and these monitors are usually implemented at cache level (usually L2). Each cache line will have a monitor.

  • Thus your atomic data should be contained in a single cache line, alignment will follow from this requirement
  • If multiple atomics occupy the same cache line, when one atomic is in exclusive state all other atomics in the same cache line will be in a false exclusive state. This will cause inefficiencies in locking. Having cache line aligned atomics avoid this problem. Note: Multiple atomics in the same cache line will still work, but will be inefficient
like image 39
Arun Valiaparambil Avatar answered Nov 15 '22 07:11

Arun Valiaparambil