Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird C++14 and C++17 difference in assignment operator

I have the following code:

#include <vector>
#include <iostream>

std::vector <int> a;

int append(){
  a.emplace_back(0);
  return 10;
}

int main(){
  a = {0};
  a[0] = append();
  std::cout << a[0] << '\n';
  return 0;
}

The function append() as a side effect increases the vector size by one. Because of how vectors work, this can trigger a reallocation of its memory when exceeding its capacity.

So, when doing a[0] = append(), if a reallocation happens, then a[0] is invalidated and points to the old memory of the vector. Because of this you can expect the vector to end up being {0, 0} instead of {10, 0}, because it is assigning to the old a[0] instead of the new one.

The weird thing that confuses me is that this behavior changes between C++14 and C++17.

On C++14, the program will print 0. On C++17, it will print 10, meaning a[0] actually got 10 assigned to it. So, I have the following questions whose answers I couldn't find:

  • Is C++17 evaluating a[0]'s memory address after evaluating the RHS of the assignment expression? Does C++14 evaluate this before and that's why it changes?
  • Was this a bug that was fixed in C++17? What changed in the standard?
  • Is there a clean way to make this assignment behave like in C++17 when using C++14 or C++11?
like image 612
Blaz Korecic Avatar asked Oct 29 '21 00:10

Blaz Korecic


Video Answer


3 Answers

This code is UB prior to C++17, as pointed out in comments, due to C++ evaluation order rules. The basic problem: Order of operations is not order of evaluation. Even something like x++ + x++ is UB.

In C++17, sequencing rules for assignments were changed:

  1. In every simple assignment expression E1=E2 and every compound assignment expression E1@=E2, every value computation and side-effect of E2 is sequenced before every value computation and side effect of E1
like image 156
qz- Avatar answered Oct 19 '22 08:10

qz-


The program has undefined behavior prior to C++17 because the evaluation order is unspecified and one of the possible choices leads to using an invalidated reference. (That’s how undefined behavior works: even if you log the two evaluations and the right hand side logs first, it might have evaluated the other way with the effect of the undefined behavior being the incorrect logging.)

While it’s an informal point, the change in C++17 to specify the order of evaluation for certain operators (including =) wasn’t considered to be a bug fix, which is why compilers don’t implement the new rules in prior language modes. (It’s the code that’s broken, not the language.)

Cleanliness is subjective, but the usual way to deal with an ordering issue like this is to introduce a temporary variable:

{  // limit scope
  auto x=append();
  a[0]=x;  // with std::move for non-trivial types
}

This occasionally interferes with passing by value to the assignment operator, which can’t be helped.

like image 34
Davis Herring Avatar answered Oct 19 '22 10:10

Davis Herring


While rules are now specified in more cases, one should depend on new rules only when it make the code easier to understand or more efficient or correct.

Your code has many issues:

  • You are using global variables
  • You function does 2 unrelated things
  • You have hard-coded constant in code
  • You modify the global vector twice in the same expression

More details

It is well known that one should avoid global variable. This is even worst because your variable in a single letter name (a) which make it prone to name conflict.

append function does 2 things. It append value and return an unrelated value. It would be best to have 2 separate functions:

void append_0_to_a() 
{ 
    a.emplace_back(0); 
}

// I have no idea what 10 represent in your code so I make a guess
int get_number_of_fingers() 
{ 
    const int hands = 2; 
    const int fingerPerHands = 5; 
    return hands * fingerPerHands; 
}

Main code will then be much more readable by writing

a = { 0 };
append_0_to_a();
a[0] = get_number_of_fingers();

But even then, it is not clear why you use so many syntax to modify your vector. Why not simply write something like

int main()
{
    std::vector <int> a = { get_number_of_fingers(), 0 };
    std::cout << a[0] << '\n';
    return 0;
}

By writing cleaner code, you don't need advanced knowledge of evalution order and other people that read your code will understand it more easily.

While evaluations rules were updated mainly in C++ 11 and then C++ 17, one should not abuse of those rule to write code hard to read.

Rules were improved to make code previsible in some case like when a function receive multiple std::unique_ptr in arguments. By forcing compiler to evaluate completly one argument before evaluating the other, it will make the code exception safe (no memory leak) in a case like that:

// Just a simplified example --- not real code
void f(std::unique_ptr<int> a, std::unique_ptr<int> b)
{
    ...
}

f(new 2, new 3);

Newer rules ensure that std::unique_ptr constructor for one argument is called before new call for the other argument thus preventing a possible leak if an exception is throw by second call to new.

Newer rules also ensure some ordering which is important for user defined functions and operators so that chaining calls is intuitive. As far as I know, this is useful for libraries like improved futures and other asynch stuff as on older rules there was too much undefined or unspecified bahavior.

As I have metionned in a comment, original rules allows compiler for more agressive optimisations.

An expression like i++ + i++ was essentially undefined because in the general case were the variables are differents (say i and j) a compiler can reorder instructions to make more efficient code and the compiler didn't have to consider the special case where a variable could repeat in which case the generated code might give different result depending how it was implemented.

Most of the original rules are based on C which does not support operator overloading.

However with user defined types, that flexibility was not always desirable as sometime it gave wrong code for code that look right (as my simplified f function above).

Thus the rules were defined more precisely to fix such undesirable behavior. Newer rules also apply to operators on predefined types like int.

As one should never write code that depend on undefined behavior, any valid program written before C++11, is also valid under stricter rule of C++ 11. Same thing for C++ 17.

On the other hand, if your program is written using C++17 rules, it might have undefined behavior if you compiler with an older compiler.

Generally, one would start to write using newer rule at a point he does not have to return to an older compiler.

Obvioulsy, if one write libraries for other, then he need to ensure that his code does not have undefined behavior for supported C++ standard.

like image 1
Phil1970 Avatar answered Oct 19 '22 08:10

Phil1970