Background: So I'm working on a raytracer.. for my construction of the spatial partitioning scheme, I initially had some code like this:
if (msize <= 2) { // create a leaf node
Model **models = new Model*[msize];
for (uint i=0; i<msize; ++i)
models[i] = &mlist[i];
*arrayPtr = Node(models, msize); // class Node contains a copy of models
... increment arrayPtr ...
return;
}
Basically, after this spatial partitioning tree is constructed, rays traverse the tree looking for Models, which are all stored in one big array. Leaf Nodes contain pointers to an array of pointers of Models.
Then I realized that hey, there's no reason for me to add that extra level of indirection; if I arrange my Models properly, I can get the leaf Nodes to point directly to the big array of models. Models adjacent to each other in the big array would all belong to a given leaf node, so the leaves would contain pointers to Models. So I did this, and tested it with everything else held constant.
Now one would think that this would obviously speed up the program. Well, it does speed up the single-threaded version (by about 10%), but it slows down the multithreaded one (by about 15%! Which is quite significant if you're doing heavy optimizing.) I'm quite at a loss on how to tackle this -- I thought indirection was bad, I thought reducing memory usage was good, especially for multithreading.. there's no writing whatsoever to the leaf node or the Model, all writing is done to a separate data structure.
Any pointers / suggestions on how to analyze the problem would be great.
Some miscellaneous statistics: cachegrind tells me that there are less instruction refs / cache misses for the double-indirection approach, but more data refs / cache misses. The difference isn't that large though, for both.
Edit: As requested, the data structure that I am concerned with:
class Node {
ushort type;
union {
ushort axisID;
ushort childrenSize;
};
union {
Model **models;
Node *rightChild;
};
float leftPlane, rightPlane;
... public methods and stuff ...
}
I basically change Model **models
to Model *models
, and then I get the speed dip. Class Model
itself contains a pointer to two abstract classes, Shape
and Material
. All the classes mentioned here are block-allocated, with the exception of Material
since at the moment I'm just using one.
My first guess is that you are running into false-sharing. If you have multiple threads both modifying memory in the same cache line, the hardware is going to spend a lot of time passing ownership of the cache line between processors.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With