Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I trust NVCC to optimize away std::pair in return types?

Sometimes, one wants to write a (small) CUDA device-side function which returns two values. In C, you would have that function take two out-parameters, e.g.:

__device__ void pair_maker(float x, float &out1, float& out2);

but in C++, the idiomatic way to write this is to return an std::pair (well, maybe an std::tuple, or a struct, but C++ tuples are clunky and a struct is not generic enough):

__device__ std::pair<float, float> pair_maker(float x);

My question: Can I trust NVCC (with --expt-relaxed-constexpr) to optimize-away the construction of the pointer, and just assign directly to the variables which I later assign to from the .first and .second elements of the pair?

like image 261
einpoklum Avatar asked Sep 03 '25 06:09

einpoklum


1 Answers

I don't have a complete answer, but from my limited experience - it seems that NVCC can optimize the std::pair away. Illustration (also on GodBolt):

#include <utility>

 __device__ std::pair<float, float> pair_maker(float x) {
    float  sin, cos;
    __sincosf(x, &sin, &cos);
    return {sin, cos};
}

__device__ float foo(float x) {
    auto p = pair_maker(x);
    auto sin = p.first;
    auto cos = p.second;
    return sin + cos;
}

__global__ void bar(float x, float *out) { *out = foo(x); }

__global__ void baz(float x, float *out) {
    float sin, cos;
    __sincosf(x, &sin, &cos);
    *out = sin + cos;
}

The kernels bar() and baz() compile to the same PTX code:

ld.param.f32    %f1, [param_0];
ld.param.u64    %rd1, [param_1];
cvta.to.global.u64      %rd2, %rd1;
sin.approx.f32  %f2, %f1;
cos.approx.f32  %f3, %f1;
add.f32         %f4, %f2, %f3;
st.global.f32   [%rd2], %f4;
ret;

No extra copies or construction-related operations.

like image 103
einpoklum Avatar answered Sep 04 '25 23:09

einpoklum