Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should you not use [[carries_dependency]]?

I've found questions (like this one) asking what [[carries_dependency]] does, and that's not what I'm asking here.

I want to know when you shouldn't use it, because the answers I've read all make it sound like you can plaster this code everywhere and magically you'd get equal or faster code. One comment said the code can be equal or slower, but the poster didn't elaborate.

I imagine appropriate places to use this is on any function return or parameter that is a pointer or reference and that will be passed or returned within the calling thread, and it shouldn't be used on callbacks or thread entry points.

Can someone comment on my understanding and elaborate on the subject in general, of when and when not to use it?

EDIT: I know there's this tome on the subject, should any other reader be interested; it may contain my answer, but I haven't had the chance to read through it yet.

like image 944
Matthew Reddington Avatar asked Nov 21 '13 16:11

Matthew Reddington


2 Answers

In modern C++ you should generally not use std::memory_order_consume or [[carries_dependency]] at all. They're essentially deprecated while the committee comes up with a better mechanism that compilers can practically implement.

And that hopefully doesn't require sprinkling [[carries_dependency]] and kill_dependency all over the place.

2016-06 P0371R1: Temporarily discourage memory_order_consume

It is widely accepted that the current definition of memory_order_consume in the standard is not useful. All current compilers essentially map it to memory_order_acquire. The difficulties appear to stem both from the high implementation complexity, from the fact that the current definition uses a fairly general definition of "dependency", thus requiring frequent and inconvenient use of the kill_dependency call, and from the frequent need for [[carries_dependency]] annotations. Details can be found in e.g. P0098R0.

Notably that in C++ x - x still carries a dependency but most compilers would naturally break the dependency and replace that expression with a constant 0. But also compilers do sometimes turn data dependencies into control dependencies if they can prove something about value-ranges after a branch.


On modern compilers that just promote mo_consume to mo_acquire, fully aggressive optimizations can always happen; there's never anything to gain from [[carries_dependency]] and kill_dependency even in code that uses mo_consume, let alone in other code.


This strengthening to mo_acquire has potentially-significant performance cost (an extra barrier) for real use-cases like RCU on weakly-ordered ISAs like POWER and ARM. See this video of Paul E. McKenney's CppCon 2015 talk C++ Atomics: The Sad Story of memory_order_consume. (Link includes a summary).

If you want real dependency-ordering read-only performance, you have to "roll your own", e.g. by using mo_relaxed and checking the asm to verify it compiled to asm with a dependency. (Avoid doing anything "weird" with such a value, like passing it across functions.) DEC Alpha is basically dead and all other ISAs provide dependency ordering in asm without barriers, as long as the asm itself has a data dependency.

If you don't want to roll your own and live dangerously, it might not hurt to keep using mo_consume in "simple" use-cases where it should be able to work; perhaps some future mo_consume implementation will have the same name and work in a way that's compatible with C++11.


There is ongoing work on making a new consume, e.g. 2018's http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html

like image 124
Peter Cordes Avatar answered Sep 30 '22 12:09

Peter Cordes


because the answers I've read all make it sound like you can plaster this code everywhere and magically you'd get equal or faster code

The only way you can get faster code is when that annotation allows the omission of a fence.

So the only case where it could possibly be useful is:

  • your program uses consume ordering on an atomic load operation, in an important frequently executed code;
  • the "consume value" isn't just used immediately and locally, but also passed to other functions;
  • the target CPU gives specific guarantees for consuming operations (as strong as a given fence before that operation, just for that operation);
  • the compiler writers take their job seriously: they manage to translate high level language consuming of a value to CPU level consuming, to get the benefit from CPU guarantees.

That's a bunch of necessary conditions to possibly get measurably faster code.

(And the latest trend in the C++ community is to give up inventing a proper compiling scheme that's safe in all cases and to come up with a completely different way for the user to instruct the compiler to produce code that "consumes" values, with much more explicit, naively translatable, C++ code.)

One comment said the code can be equal or slower, but the poster didn't elaborate.

Of course annotations of the kind that you can randomly put on programs simply cannot make code more efficient in general! That would be too easy and also self contradictory.

Either some annotation specifies a constrain on your code, that is a promise to the compiler, and you can't put it anywhere where it doesn't correspond an guarantee in the code (like noexcept in C++, restrict in C), or it would break code in various ways (an exception in a noexcept function stops the program, aliasing of restricted pointers can cause funny miscompilation and bad behavior (formerly the behavior is not defined in that case); then the compiler can use it to optimize the code in specific ways.

Or that annotation doesn't constrain the code in any way, and the compiler can't count on anything and the annotation does not create any more optimization opportunity.

If you get more efficient code in some cases at no cost of breaking program with an annotation then you must potentially get less efficient code in other cases. That's true in general and specifically true with consume semantic, which imposes the previously described constrained on translation of C++ constructs.

I imagine appropriate places to use this is on any function return or parameter that is a pointer or reference and that will be passed or returned within the calling thread

No, the one and only case where it might be useful is when the intended calling function will probably use consume memory order.

like image 44
curiousguy Avatar answered Sep 30 '22 13:09

curiousguy