Is it worse in any aspect to use the CMPXCHG instruction on an 8-bit field than on a 32-bit field?

1 Answers

No, there's no penalty for lock cmpxchg [mem], reg 8 vs. 32-bit. Modern x86 CPUs can load and store to their L1d cache with no penalty for a single byte vs. an aligned dword or qword. Can modern x86 hardware not store a single byte to memory? answer: it can with zero penalty¹ because they spend the transistors to make even unaligned loads/stores fast.

The surrounding asm instructions dealing with a narrow integer in a register should also have negligible if any extra cost vs. [u]int32_t. See Why doesn't GCC use partial registers? - most compilers know how to be careful with partial registers, and modern CPUs (Haswell and later, and all non-Intel) don't rename the low 8 separately from the rest of the register so the only danger is false dependencies. Depending on exactly what you're doing, it might be best to use unsigned local temporaries with an _Atomic uint8_t, or it might be best to make your locals also uint8_t.

Footnote 1: Unlike on some non-x86 CPUs where a byte store actually is implemented with a cache RMW cycle (Are there any modern CPUs where a cached byte store is actually slower than a word store?). On those CPUs you'd hope that atomic xchg would be just as cheap for word vs. byte, but that's too much to hope for with cmpxchg. But almost all non-x86 ISAs have LL/SC instead of xchg / cmpxchg anyway, so even an atomic exchange is separate LL and SC instructions, and the SC would be take an RMW cycle to commit to cache.

116

answered Nov 02 '22 05:11

Peter Cordes

Related questions
                            
                                Why is pointing to one before the first element of an array not allowed in C?
                            
                                In C, is it legal to add `const` only in function definitions, not declarations?
                            
                                Crosscompiling GCC: Link tests are not allowed after GCC_NO_EXECUTABLES when checking dynamic linker characteristics
                            
                                Why are the 'dereference' and the 'address of' operators on the left?
                            
                                Why are IP_TTL and IP_MULTICAST_TTL separate socket options?
                            
                                How do you check if a serial port is open in Linux?
                            
                                Fastest way to zero pages in Linux
                            
                                Where are returned values stored?
                            
                                How to assert two types are equal in c?
                            
                                how to prevent linker from discarding a function?
                            
                                DMB instructions in an interrupt-safe FIFO
                            
                                Preprocessor definition duplication
                            
                                Force a C compiler to produce integer narrowing warning
                            
                                `Cannot open include file: 'apr_perms_set.h'` when doing `pip install mod_wsgi`
                            
                                Question about GCC Optimizer and why this code always returns 42?
                            
                                Is it possible to test that an abort-routine doesn't return?
                            
                                Exclude a word if it is present in an array of words
                            
                                C standard regarding pointer arithmetic outside arrays
                            
                                How compile time initialization of variables works internally in c?
                            
                                How do I create a C project in visual Studio 2019?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it worse in any aspect to use the CMPXCHG instruction on an 8-bit field than on a 32-bit field?

Tags:

c

x86

assembly

instruction-set

c11

Dewr

People also ask

1 Answers

Peter Cordes

Recent Activity

Donate For Us