I'm having a lots of (x1,y1,z1),(x2,y2,z2),(x3,y3,z3) single precision vector triplets, and I want to reorder them, so (x1,y1,z1),(x2,y2,z2),(x3,y3,z3) becomes (x1,x2,x3,0,y1,y2,y3,0,z1,z2,z3,0)
The goal is to prepere the dataset for an SSE based calculation. I have the following code to do this:
for (int i=0;i<count;i++)
{
Vect3F p0 = get_first_point(i);
Vect3F p1 = get_second_point(i);
Vect3F p2 = get_third_point(i);
int idx = i*3;
scratch[idx] = Vec4F(p0.x, p1.x, p2.x, 0); // These 3 rows are the slowest
scratch[idx+1] = Vec4F(p0.y, p1.y, p2.y, 0);
scratch[idx+2] = Vec4F(p0.z, p1.z, p2.z, 0);
}
The last 3 rows of the loop are extremely slow, they take 90% percent of the time of my entire algorithm!
Is it normal? Can I make such shuffleing faster? (scratch is a static variable, and is 16-aligned. The function is called frequently, so I think the blocks of scratch should not disappear from the cache.)
First of all, you shouln't create 3 temporary vector objects. Instead of:
tri = triangles[i];
Vect3F p0 = points[indices[tri]];
Vect3F p1 = points[indices[tri+1]];
Vect3F p2 = points[indices[tri+2]];
You should just copy data using memcpy(); Make a loop that goes for your entire collection and copies raw data. It is the fastest way I can think of.
Using 3 variables runs a lot of constructors which are painfully slow. The second way (from comment) isn't much better for the same reason.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With