Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reordering 3D vector triplets in column major order is slow

Tags:

c++

c

simd

sse

I'm having a lots of (x1,y1,z1),(x2,y2,z2),(x3,y3,z3) single precision vector triplets, and I want to reorder them, so (x1,y1,z1),(x2,y2,z2),(x3,y3,z3) becomes (x1,x2,x3,0,y1,y2,y3,0,z1,z2,z3,0)

The goal is to prepere the dataset for an SSE based calculation. I have the following code to do this:

for (int i=0;i<count;i++)
{
    Vect3F p0 = get_first_point(i);
    Vect3F p1 = get_second_point(i);
    Vect3F p2 = get_third_point(i);
    int idx = i*3;
    scratch[idx] = Vec4F(p0.x, p1.x, p2.x, 0); // These 3 rows are the slowest
    scratch[idx+1] = Vec4F(p0.y, p1.y, p2.y, 0);
    scratch[idx+2] = Vec4F(p0.z, p1.z, p2.z, 0);
}

The last 3 rows of the loop are extremely slow, they take 90% percent of the time of my entire algorithm!

Is it normal? Can I make such shuffleing faster? (scratch is a static variable, and is 16-aligned. The function is called frequently, so I think the blocks of scratch should not disappear from the cache.)

like image 420
klapancius Avatar asked Oct 29 '11 01:10

klapancius


1 Answers

First of all, you shouln't create 3 temporary vector objects. Instead of:

tri = triangles[i];
Vect3F p0 = points[indices[tri]];
Vect3F p1 = points[indices[tri+1]];
Vect3F p2 = points[indices[tri+2]];

You should just copy data using memcpy(); Make a loop that goes for your entire collection and copies raw data. It is the fastest way I can think of.

Using 3 variables runs a lot of constructors which are painfully slow. The second way (from comment) isn't much better for the same reason.

like image 152
Bartek Banachewicz Avatar answered Oct 24 '22 05:10

Bartek Banachewicz