Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding C++ constexpr Performance

I recently wrote a compile-time ray tracer using constexpr functions with C++17. The full source code can be seen here. The relevant code for this question looks like this:

constexpr auto image = []() {
        StaticImage<image_width, image_height> image;

        Camera camera{Pointf{0.0f, 0.0f, 500.0f},
                      Vectorf{0.0f},
                      Vectorf{0.0f, 1.0f, 0.0f},
                      500.0f};

        std::array<Shapes, 1> shapes_list{Sphere{Pointf{0.0f}, 150.0f}};
        std::array<Materials, 1> materials_list{DefaultMaterial{}};
        ShapeContainer<decltype(shapes_list)> shapes{std::move(shapes_list)};
        MaterialContainer<decltype(materials_list)> materials{
            std::move(materials_list)};

        SphereScene scene;
        scene.set_camera(camera);

        Renderer::render(scene, image, shapes, materials);
        return image;
    }();

Where each of the classes shown here (StaticImage, Camera, Shapes, Materials, ShapeContainer, MaterialContainer, and SphereScene) consist entirely of constexpr functions. Renderer::render is also constexpr and is in charge of looping over every pixel in the image, shooting rays into the scene, and setting the corresponding colour.

With this current setup and an image of 512x512, using MSVC 16.9.2 in Release mode, the compiler takes approximately 35 minutes to finish generating the image. During this process, its memory usage rises to the point where it ends up using almost 64GB of RAM.

So, my questions is: why are the compilation time and memory usage so high?

My theory was that part of the reason for the compilation time was the complexity of the call-stacks (i.e. lots of templates, CRTP, and depth), so I tried simplifying the call stack a bit by removing several templates (the Vector class is no longer templated for example) and managed to reduce the compilation time down to 32 minutes, and the memory usage to 61GB. Better, but still very high. The thing is that I can't quite figure out why it's so slow. I do understand that evaluating all of the constexpr functions is a very involved process (since the compiler has to check for UB, type-deduction, etc.) but I wasn't expecting it to be quite this slow. I'm also really confused by the high memory usage. The image array itself uses no more than 4MB of memory (512 * 512 * 3 * sizeof(float)) so where is the extra memory coming from?

like image 375
Mauricio Avatar asked Oct 14 '22 21:10

Mauricio


1 Answers

Compile-time execution is going to be much less efficient than runtime execution. The compiler has to do more work to execute the same code. The point of compile-time execution is to do computations that you can't do at runtime. And sometimes, to compile-time cache simpler computations.

Writing a whole, non-trivial application that exists only at compile-time is not going to be a fast thing to get done.

As for the particulars, the principle reason for the cost increase is that compile-time execution has to detect all undefined behavior. This means that a lot of things that might just be offsetting a pointer have to be more complicated. Stack variables can't just be offsetting the stack pointer; they have to track the lifetime of the object explicitly. And so forth.

Compile-time execution is basically interpreted C++. And there's not much reason to make it a particularly fast interpreter. Most compile-time operations are dealing with computations based on types and simple values, not with complex data structures. So that's what compilers are primarily optimized for.

I recall that some noise had been made recently to improve Clang's constexpr execution via better interpretation. But I don't know how much came of it.

like image 137
Nicol Bolas Avatar answered Oct 18 '22 14:10

Nicol Bolas