Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't GCC assume that std::vector::size won't change in this loop?

Tags:

c++

gcc

assembly

I claimed to a coworker that if (i < input.size() - 1) print(0); would get optimized in this loop so that input.size() is not read in every iteration, but it turns out that this is not the case!

void print(int x) {
    std::cout << x << std::endl;
}

void print_list(const std::vector<int>& input) {
    int i = 0;
    for (size_t i = 0; i < input.size(); i++) {
        print(input[i]);
        if (i < input.size() - 1) print(0);
    }
}

According to the Compiler Explorer with gcc options -O3 -fno-exceptions we are actually reading input.size() each iteration and using lea to perform a subtraction!

        movq    0(%rbp), %rdx
        movq    8(%rbp), %rax
        subq    %rdx, %rax
        sarq    $2, %rax
        leaq    -1(%rax), %rcx
        cmpq    %rbx, %rcx
        ja      .L35
        addq    $1, %rbx

Interestingly, in Rust this optimization does occur. It looks like i gets replaced with a variable j that is decremented each iteration, and the test i < input.size() - 1 is replaced with something like j > 0.

fn print(x: i32) {
    println!("{}", x);
}

pub fn print_list(xs: &Vec<i32>) {
    for (i, x) in xs.iter().enumerate() {
        print(*x);
        if i < xs.len() - 1 {
            print(0);
        }
    }
}

In the Compiler Explorer the relevant assembly looks like this:

        cmpq    %r12, %rbx
        jae     .LBB0_4

I checked and I am pretty sure r12 is xs.len() - 1 and rbx is the counter. Earlier there is an add for rbx and a mov outside of the loop into r12.

Why is this? It seems like if GCC is able to inline the size() and operator[] as it did, it should be able to know that size() does not change. But maybe GCC's optimizer judges that it is not worth pulling it out into a variable? Or maybe there is some other possible side effect that would make this unsafe--does anyone know?

like image 792
Jonathan Chan Avatar asked Jan 29 '20 00:01

Jonathan Chan


1 Answers

The non-inline function call to cout.operator<<(int) is a black box for the optimizer (because the library is just written in C++ and all the optimizer sees is a prototype; see discussion in comments). It has to assume any memory that could possibly be pointed to by a global var has been modified.

(Or the std::endl call. BTW, why force a flush of cout at that point instead of just printing a '\n'?)

e.g. for all it knows, std::vector<int> &input is a reference to a global variable, and one of those function calls modifies that global var. (Or there's a global vector<int> *ptr somewhere, or there's a function that returns a pointer to a static vector<int> in some other compilation unit, or some other way that a function could get a reference to this vector without being passed a reference to it by us.

If you had a local variable whose address had never been taken, the compiler could assume that non-inline function calls couldn't mutate it. Because there'd be no way for any global variable to hold a pointer to this object. (This is called Escape Analysis). That's why the compiler can keep size_t i in a register across function calls. (int i can just get optimized away because it's shadowed by size_t i and not used otherwise).

It could do the same with a local vector (i.e. for the base, end_size and end_capacity pointers.)

ISO C99 has a solution for this problem: int *restrict foo. Many C++ compiles support int *__restrict foo to promise that memory pointed to by foo is only accessed via that pointer. Most commonly useful in functions that take 2 arrays, and you want to promise the compiler they don't overlap. So it can auto-vectorize without generating code to check for that and run a fallback loop.

The OP comments:

In Rust a non-mutable reference is a global guarantee that no one else is mutating the value you have a reference to (equivalent to C++ restrict)

That explains why Rust can make this optimization but C++ can't.


Optimizing your C++

Obviously you should use auto size = input.size(); once at the top of your function so the compiler knows it's a loop invariant. C++ implementations don't solve this problem for you, so you have to do it yourself.

You might also need const int *data = input.data(); to hoist loads of the data pointer from the std::vector<int> "control block" as well. It's unfortunate that optimizing can require very non-idiomatic source changes.

Rust is a much more modern language, designed after compiler developers learned what was possible in practice for compilers. It really shows in other ways, too, including portably exposing some of the cool stuff CPUs can do via i32.count_ones, rotate, bit-scan, etc. It's really dumb that ISO C++ still doesn't expose any of these portably, except std::bitset::count().

like image 126
Peter Cordes Avatar answered Oct 23 '22 08:10

Peter Cordes