A few years back, I got an assignment at school, where I had to parallelize a Raytracer.
It was an easy assignment, and I really enjoyed working on it.
Today, I felt like profiling the raytracer, to see if I could get it to run any faster (without completely overhauling the code). During the profiling, I noticed something interesting:
// Sphere.Intersect
public bool Intersect(Ray ray, Intersection hit)
{
double a = ray.Dir.x * ray.Dir.x +
ray.Dir.y * ray.Dir.y +
ray.Dir.z * ray.Dir.z;
double b = 2 * (ray.Dir.x * (ray.Pos.x - Center.x) +
ray.Dir.y * (ray.Pos.y - Center.y) +
ray.Dir.z * (ray.Pos.z - Center.z));
double c = (ray.Pos.x - Center.x) * (ray.Pos.x - Center.x) +
(ray.Pos.y - Center.y) * (ray.Pos.y - Center.y) +
(ray.Pos.z - Center.z) * (ray.Pos.z - Center.z) - Radius * Radius;
// more stuff here
}
According to the profiler, 25% of the CPU time was spent on get_Dir
and get_Pos
, which is why, I decided to optimize the code in the following way:
// Sphere.Intersect
public bool Intersect(Ray ray, Intersection hit)
{
Vector3d dir = ray.Dir, pos = ray.Pos;
double xDir = dir.x, yDir = dir.y, zDir = dir.z,
xPos = pos.x, yPos = pos.y, zPos = pos.z,
xCen = Center.x, yCen = Center.y, zCen = Center.z;
double a = xDir * xDir +
yDir * yDir +
zDir * zDir;
double b = 2 * (xDir * (xPos - xCen) +
yDir * (yPos - yCen) +
zDir * (zPos - zCen));
double c = (xPos - xCen) * (xPos - xCen) +
(yPos - yCen) * (yPos - yCen) +
(zPos - zCen) * (zPos - zCen) - Radius * Radius;
// more stuff here
}
With astonishing results.
In the original code, running the raytracer with its default arguments (create a 1024x1024 image with only direct lightning and without AA) would take ~88 seconds.
In the modified code, the same would take a little less than 60 seconds.
I achieved a speedup of ~1.5 with only this little modification to the code.
At first, I thought the getter for Ray.Dir
and Ray.Pos
were doing some stuff behind the scene, that would slow the program down.
Here are the getters for both:
public Vector3d Pos
{
get { return _pos; }
}
public Vector3d Dir
{
get { return _dir; }
}
So, both return a Vector3D, and that's it.
I really wonder, how calling the getter would take that much longer, than accessing the variable directly.
Is it because of the CPU caching variables? Or maybe the overhead from calling these methods repeatedly added up? Or maybe the JIT handling the latter case better than the former? Or maybe there's something else I'm not seeing?
Any insights would be greatly appreciated.
As @MatthewWatson suggested, I used a StopWatch
to time release builds outside of the debugger. In order to get rid of noise, I ran the tests multiple times. As a result, the former code takes ~21 seconds (between 20.7 and 20.9) to finish, whereas the latter only ~19 seconds (between 19 and 19.2).
The difference has become negligible, but it is still there.
Queries can become slow for various reasons ranging from improper index usage to bugs in the storage engine itself. However, in most cases, queries become slow because developers or MySQL database administrators neglect to monitor them and keep an eye on their performance.
Use column names instead of SELECT * While using SELECT statements use only the columns you need in your result, instead of using SELECT * from … This will reduce the result size considerably and speed your SQL query.
Reduce nested views to reduce lags This nesting causes too many data returns for every single query, which either makes the database crawl, or completely give up and give no returns. Minimizing nesting is a simple way to make your SQl query efficient and significantly improve speeds.
I'd be willing to bet that the original code is so much slower because of a quirk in C# involving properties of type structs. It's not exactly intuitive, but this type of property is inherently slow. Why? Because structs are not passed by reference. So in order to access ray.Dir.x
, you have to
ray
.get_Dir
and store the result in a temporary variable. This involves copying the entire struct, even though only the field 'x' is ever used.x
from the temporary copy.Looking at the original code, the get accessors are called 18 times. This is a huge waste, because it means that the entire struct is copied 18 times overall. In your optimized code, there are only two copies - Dir
and Pos
are both called only once; further access to the values only consist of the third step from above:
x
from the temporary copy.To sum it up, structs and properties do not go together.
It has something to do with the fact that in C#, structs are value types. You are passing around the value itself, rather than a pointer to the value.
In debug mode, optimizations like this are skipped to provide for a better debegging experience. Even in release mode, you'll find that most jitters don't often do this. I don't know exactly why, but I believe it is because the field is not always word-aligned. Modern CPUs have odd performance requirements. :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With