I have a simple raytracer in python. rendering an image 200x200 takes 4 minutes, which is definitely too much for my taste. I want to improve the situation.
Some points: I shoot multiple rays per each pixel (to provide antialiasing) for a grand total of 16 rays per pixel. 200x200x16 is a grand total of 640000 rays. Each ray must be tested for impact on multiple Sphere objects in the scene. Ray is also a rather trivial object
class Ray(object):
def __init__(self, origin, direction):
self.origin = numpy.array(origin)
self.direction = numpy.array(direction)
Sphere is slightly more complex, and carries the logic for hit/nohit:
class Sphere(object):
def __init__(self, center, radius, color):
self.center = numpy.array(center)
self.radius = numpy.array(radius)
self.color = color
@profile
def hit(self, ray):
temp = ray.origin - self.center
a = numpy.dot(ray.direction, ray.direction)
b = 2.0 * numpy.dot(temp, ray.direction)
c = numpy.dot(temp, temp) - self.radius * self.radius
disc = b * b - 4.0 * a * c
if (disc < 0.0):
return None
else:
e = math.sqrt(disc)
denom = 2.0 * a
t = (-b - e) / denom
if (t > 1.0e-7):
normal = (temp + t * ray.direction) / self.radius
hit_point = ray.origin + t * ray.direction
return ShadeRecord.ShadeRecord(normal=normal,
hit_point=hit_point,
parameter=t,
color=self.color)
t = (-b + e) / denom
if (t > 1.0e-7):
normal = (temp + t * ray.direction) / self.radius hit_point = ray.origin + t * ray.direction
return ShadeRecord.ShadeRecord(normal=normal,
hit_point=hit_point,
parameter=t,
color=self.color)
return None
Now, I ran some profiling, and it appears that the longest processing time is in the hit() function
ncalls tottime percall cumtime percall filename:lineno(function)
2560000 118.831 0.000 152.701 0.000 raytrace/objects/Sphere.py:12(hit)
1960020 42.989 0.000 42.989 0.000 {numpy.core.multiarray.array}
1 34.566 34.566 285.829 285.829 raytrace/World.py:25(render)
7680000 33.796 0.000 33.796 0.000 {numpy.core._dotblas.dot}
2560000 11.124 0.000 163.825 0.000 raytrace/World.py:63(f)
640000 10.132 0.000 189.411 0.000 raytrace/World.py:62(hit_bare_bones_object)
640023 6.556 0.000 170.388 0.000 {map}
This does not surprise me, and I want to reduce this value as much as possible. I pass to line profiling, and the result is
Line # Hits Time Per Hit % Time Line Contents
==============================================================
12 @profile
13 def hit(self, ray):
14 2560000 27956358 10.9 19.2 temp = ray.origin - self.center
15 2560000 17944912 7.0 12.3 a = numpy.dot(ray.direction, ray.direction)
16 2560000 24132737 9.4 16.5 b = 2.0 * numpy.dot(temp, ray.direction)
17 2560000 37113811 14.5 25.4 c = numpy.dot(temp, temp) - self.radius * self.radius
18 2560000 20808930 8.1 14.3 disc = b * b - 4.0 * a * c
19
20 2560000 10963318 4.3 7.5 if (disc < 0.0):
21 2539908 5403624 2.1 3.7 return None
22 else:
23 20092 75076 3.7 0.1 e = math.sqrt(disc)
24 20092 104950 5.2 0.1 denom = 2.0 * a
25 20092 115956 5.8 0.1 t = (-b - e) / denom
26 20092 83382 4.2 0.1 if (t > 1.0e-7):
27 20092 525272 26.1 0.4 normal = (temp + t * ray.direction) / self.radius
28 20092 333879 16.6 0.2 hit_point = ray.origin + t * ray.direction
29 20092 299494 14.9 0.2 return ShadeRecord.ShadeRecord(normal=normal, hit_point=hit_point, parameter=t, color=self.color)
So, it appears that most of the time is spent in this chunk of code:
temp = ray.origin - self.center
a = numpy.dot(ray.direction, ray.direction)
b = 2.0 * numpy.dot(temp, ray.direction)
c = numpy.dot(temp, temp) - self.radius * self.radius
disc = b * b - 4.0 * a * c
Where I don't really see a lot to optimize. Do you have any idea how to make this code more performant without going C ?
For reduction of computational effort in this paper three improvements for the ray tracing algorithm will be introduced. Two of them are extensions of existing improvements, the third is a new idea: (1) use of enclosures in the object tree; (2) bounding the ray length and (3) dynamical temporary object trees.
Real-Time Ray Tracing is the biggest leap in computer graphics in years, bringing realistic lighting, shadows and effects to games, enhancing image quality, gameplay and immersion.
Ray tracing is a lighting technique that brings an extra level of realism to games. It emulates the way light reflects and refracts in the real world, providing a more believable environment than what's typically seen using static lighting in more traditional games.
Looking at your code, it looks like your main problem is that you have lines of code that are being called 2560000 times. That will tend to take a lot of time regardless what kind of work you are doing in that code. However, using numpy, you can aggregate alot of this work into a small number of numpy calls.
The first thing to do is to combine your rays together into large arrays. Instead of using a Ray object that has 1x3 vectors for origin and direction use Nx3 arrays that have all of the rays you need for the hit detection. The top of your hit function will end up looking like this:
temp = rays.origin - self.center
b = 2.0 * numpy.sum(temp * rays.direction,1)
c = numpy.sum(numpy.square(temp), 1) - self.radius * self.radius
disc = b * b - 4.0 * c
For the next part you can use
possible_hits = numpy.where(disc >= 0.0)
a = a[possible_hits]
disc = disc[possible_hits]
...
to continue with just the values that pass the discriminant test. You can easily get orders of magnitude performance improvements this way.
1) Ray tracing is fun but if you care at all about performance, dump python and switch to C. Not C++ unless you are some kind of super expert, just C.
2) The big win in scenes with multiple (20 or more) objects is to use a spatial index to reduce the number of intersection tests. Popular options are kD-trees, OctTrees, AABB.
3) If you're serious, check out ompf.org - it is the resource for this.
4) Don't go to ompf with python asking about optimization - most people there can shoot 1 Million to 2 Million rays per second through an indoor scene with 100 thousand triangles... Per core.
I love Python and ray tracing, but would never consider putting them together. In this case, the correct optimization is to switch languages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With