Why don't C++ compilers optimize away reads and writes to struct data members as opposed to distinct local variables?

Q: How do I disable compiler optimization?

Use the command-line option -O0 (-[capital o][zero]) to disable optimization, and -S to get assembly file. Look here to see more gcc command-line options. Save this answer.

Q: What is O3 in C?

-O3 : the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage.

Q: What does March native do?

Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines). Using -mtune=native produces code optimized for the local machine under the constraints of the selected instruction set.

Tags:

c++

arrays

optimization

compilation

boost

I'm trying to create a local array of some POD values (e.g. double) with fixed max_size that is known at compile time, then read a runtime size value (size <= max_size) and process first size elements from that array.

The question is, why doesn't compiler eliminate stack reads and writes when arr and size are placed into the same struct/class, as opposed to the case where arr and size are independent local variables?

Here's my code:

#include <cstddef>
constexpr std::size_t max_size = 64;

extern void process_value(double& ref_value);

void test_distinct_array_and_size(std::size_t size)
{
    double arr[max_size];
    std::size_t arr_size = size;

    for (std::size_t i = 0; i < arr_size; ++i)
        process_value(arr[i]);
}

void test_array_and_size_in_local_struct(std::size_t size)
{
    struct
    {
        double arr[max_size];
        std::size_t size;
    } array_wrapper;
    array_wrapper.size = size;

    for (std::size_t i = 0; i < array_wrapper.size; ++i)
        process_value(array_wrapper.arr[i]);
}

Assembly output for test_distinct_array_and_size from Clang with -O3:

test_distinct_array_and_size(unsigned long): # @test_distinct_array_and_size(unsigned long)
  push r14
  push rbx
  sub rsp, 520
  mov r14, rdi
  test r14, r14
  je .LBB0_3
  mov rbx, rsp
.LBB0_2: # =>This Inner Loop Header: Depth=1
  mov rdi, rbx
  call process_value(double&)
  add rbx, 8
  dec r14
  jne .LBB0_2
.LBB0_3:
  add rsp, 520
  pop rbx
  pop r14
  ret

Assembly output for test_array_and_size_in_local_struct:

test_array_and_size_in_local_struct(unsigned long): # @test_array_and_size_in_local_struct(unsigned long)
  push r14
  push rbx
  sub rsp, 520
  mov qword ptr [rsp + 512], rdi
  test rdi, rdi
  je .LBB1_3
  mov r14, rsp
  xor ebx, ebx
.LBB1_2: # =>This Inner Loop Header: Depth=1
  mov rdi, r14
  call process_value(double&)
  inc rbx
  add r14, 8
  cmp rbx, qword ptr [rsp + 512]
  jb .LBB1_2
.LBB1_3:
  add rsp, 520
  pop rbx
  pop r14
  ret

Latest GCC and MSVC compilers do basically the same thing with the stack reads and writes.

As we can see, reads and writes to the array_wrapper.size variable on the stack are not optimized away in the latter case. There is a write of size value into location [rsp + 512] before the beginning of the loop, and a read from that location after each iteration.

So, the compiler kinda expects that we'd want to modify array_wrapper.size from the process_value(array_wrapper.arr[i]) call (by taking the address of the current array element and applying some weird offsets to it?)

But, if we tried to do so from that call, woudn't that be undefined behavior?

When we rewrite the loop in the following manner

for (std::size_t i = 0, sz = array_wrapper.size; i < sz; ++i)
    process_value(array_wrapper.arr[i]);

, those unnecessary reads at the end of each iteration will be gone. But the initial write to [rsp + 512] will remain, meaning that compiler still expects us to be able to access the array_wrapper.size variable in that location from these process_value calls (by doing some weird offset-based magic).

Why?

Is that but a little shortcoming in modern compilers' implementations (that hopefully will be fixed soon)? Or does the C++ standard indeed require such behavior that leads to generation of less efficient code whenever we put an array and its size into the same class?

P.S.

I realize that my code example above might seem a bit contrived. But consider this: I'd like to use a lightweight boost::container::static_vector-like class template in my code for safer and more convenient "C++-style" manipulations with pseudo-dynamic arrays of POD elements. So my PODVector will contain an array and a size_t in the same class:

template<typename T, std::size_t MaxSize>
class PODVector
{
    static_assert(std::is_pod<T>::value, "T must be a POD type");

private:
    T _data[MaxSize];
    std::size_t _size = 0;

public:
    using iterator = T *;

public:
    static constexpr std::size_t capacity() noexcept
    {
        return MaxSize;
    }

    constexpr PODVector() noexcept = default;

    explicit constexpr PODVector(std::size_t initial_size)
        : _size(initial_size)
    {
        assert(initial_size <= capacity());
    }

    constexpr std::size_t size() const noexcept
    {
        return _size;
    }

    constexpr void resize(std::size_t new_size)
    {
        assert(new_size <= capacity());
        _size = new_size;
    }

    constexpr iterator begin() noexcept
    {
        return _data;
    }

    constexpr iterator end() noexcept
    {
        return _data + _size;
    }

    constexpr T & operator[](std::size_t position)
    {
        assert(position < _size);
        return _data[position];
    }
};

Usage:

void test_pod_vector(std::size_t size)
{
    PODVector<double, max_size> arr(size);

    for (double& val : arr)
        process_value(val);
}

If the issue described above is indeed forced by C++'s standard (and is not the fault of compiler writers), such PODVector will never be as efficient as raw usage of an array and an "unrelated" variable for size. And this would be quite bad for C++ as a language which wants zero-overhead abstractions.

419

asked Jan 17 '18 14:01

Taras

1 Answers

This is because void process_value(double& ref_value); accepts the argument by reference. The compiler/optimizer assumes aliasing, i.e. that process_value function can change memory accessible through reference ref_value and hence that size member after the array.

The compiler assumes that because the array and size are members of one same object array_wrapper function process_value can potentially cast the reference to the first element (on the first invocation) to the reference to the object (and store it elsewhere) and cast the object to unsigned char and read or replace its entire representation. So that after the function returns the state of the object must be reloaded from memory.

When size is a stand-alone object on the stack the compiler/optimizer assumes that nothing else could possibly have a reference/pointer to it and caches it in a register.

In Chandler Carruth: Optimizing the Emergent Structures of C++ he explains why the optimizers have difficulty when calling functions accepting reference/pointer arguments. Use reference/pointer function arguments only when absolutely necessary.

If you would like to change the value the more performant option is:

double process_value(double value);

And then:

array_wrapper.arr[i] = process_value(array_wrapper.arr[i]);

This change results in optimal assembly:

.L23:
movsd xmm0, QWORD PTR [rbx]
add rbx, 8
call process_value2(double)
movsd QWORD PTR [rbx-8], xmm0
cmp rbx, rbp
jne .L23

Or:

for(double& val : arr)
    val = process_value(val);

122

answered Oct 05 '22 00:10

Maxim Egorushkin

Related questions
                            
                                why default copy-ctor is generated for a class with reference member variable?
                            
                                Does C++ have a way to ignore an exception from a function?
                            
                                How do reference-counting smart pointer's avoid or handle reference-counter overflows?
                            
                                How to use std::sort with pairs and references
                            
                                Is noexcept deduction allowed in class template partial specialization?
                            
                                Is operator< (less than) on pointers consistent?
                            
                                How do I create a module defined function in a COM Type Library
                            
                                How to add custom items to QFileDialog?
                            
                                How to check for overflow in duration_cast
                            
                                convert char* to const char* in C++
                            
                                Initializing Cython objects with existing C Objects
                            
                                user defined string literal, is string null terminated?
                            
                                Why do C++ templates match if method doesn't type-check?
                            
                                Double curly braces in C++ constructor
                            
                                Forwarding reference vs const lvalue reference in template code
                            
                                C++ Bazel project with a Data repository
                            
                                Can the storage of trivially copyable objects be safely reallocated with realloc?
                            
                                What does it mean to compare pointers using relational operators?
                            
                                Is it abuse to deduce parameters of parent template when passing pointer to constexpr function in the scope of a class
                            
                                Detecting specific function calls in unit tests

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With