Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Move data from array of not aligned structs to array of aligned in c++

What is the best way to move data from array of CameraSpacePoint to array of PointXYZ?

struct CameraSpacePoint
{
    float X;
    float Y;
    float Z;
};

 __declspec(align(16))
 struct PointXYZ
 {
      float x;
      float y;
      float z;
 };

 constexpr int BIG_VAL = 1920 * 1080;

 CameraSpacePoint  camera_space_points[BIG_VAL];
 PointXYZ          points_xyz[BIG_VAL];

My solution:

CameraSpacePoint* camera_space_points_ptr = &camera_space_points[0];
PointXYZ*         points_xyz_ptr          = &points_xyz[0];

for (int i = 0; i < BIG_VAL; ++i)
{
    memcpy(points_xyz_ptr++, camera_space_points_ptr++, sizeof(CameraSpacePoint));
}

Is this the most efficient way?

like image 213
Dmitry Zhivaev Avatar asked Aug 15 '18 10:08

Dmitry Zhivaev


2 Answers

As always, readability and maintainability trumps other concerns. Write what you mean, and don't fix what isn't a problem: measure before you optimize.

std::transform(camera_space_points, std::end(camera_space_points), points_xyz,
    [](auto c){
        return PointXYZ{c.X, c.Y, c.Z};
    });

This is what you should always write as default. By their assembly output and a quick benchmark, this is pretty much equivalent to the memcpy version.

On a more hand-wavy note, optimizers are really good at micro-optimizing simple code such as copying a large chunk of memory, manual optimizations are rarely better.

like image 84
Passer By Avatar answered Oct 23 '22 08:10

Passer By


An alternative is making sure you copy chunks of 16 bytes. That way the copy can be optimized better in terms of instructions that copy 16 bytes at once (without any surrounding shuffles or other unnecessary complications), if they exist (they exist on x64 so on eg Xbox One and PC this can help). The PointXYZs will be 16 bytes, so writing 16 bytes to them is fine. The source has elements of 12 bytes, so every time one of them is copied this way there is also 4 bytes from the next element in it, they end up in the padding of the target PointXYZ and will be ignored. The last CameraSpacePoint does not necessarily have 4 readable bytes after it, it might end just before an unmapped/unreadable memory region, so there we need to be careful to not read further - unless that array can be extended a little to guarantee that the memory exists.

For example:

auto dst = ::dst;
auto src = ::src;
for (int i = 0; i + 1 < BIG_VAL; ++i)
    std::memcpy(dst++, src++, 16);
// last point is special, since the src may not have 16 bytes left to read
std::memcpy(dst, src, sizeof(CameraSpacePoint));

(on godbolt)

like image 2
harold Avatar answered Oct 23 '22 06:10

harold