For a while now I'm mentally plagued by the clash of two design philosophies for physical systems modeling and I'm wondering what kind of solutions the community came up for this.
For complex(er) simulations, I love the abstraction of creating classes for objects and how object instances of classes can be identified with the real objects that I want to study, and how certain attributes of the object represent physical features of the real life objects. Let's take ballistic particle systems as a simple example:
class Particle(object):
def __init__(self, x=0, y=0, z=0):
self.x = x
self.y = y
self.z = z
def __repr__(self):
return "x={}\ny={}\nz={}".format(self.x, self.y, self.z)
def apply_lateral_wind(self, dx, dy):
self.x += dx
self.y += dy
If I initialize this with a million values I might do this:
start_values = np.random.random((int(1e6),3))
particles = [Particle(*i) for i in start_values]
Now, let's assume that I need to do a specific thing to all my particles, like adding a lateral wind vector, just resulting in a x,y shift for this specific operation, as I have just a bunch(list) of all my particles, I would need to loop over all my particles to do that and it takes this amount of time:
%timeit _ = [p.apply_lateral_wind(0.5, 1.2) for p in particles]
1 loop, best of 3: 551 ms per loop
Now, the opposing obvious paradigm for this, which is obviously more efficient, is to stay at numpy
level and just do the mathematical operation directly onto the array, which is more than a factor of 10 faster:
%timeit start_values[...,:2] += np.array([0.5,1.2])
10 loops, best of 3: 20.3 ms per loop
My question now is, if there are any design patterns that efficiently combine these two approaches so that one wouldn't lose that much efficiency? I find it personally really easier to think in terms of object methods and attributes, it's that much clearer in my head, and for me also really the underlying reason for the success of object-oriented programming (or it's use in (physical) modeling). But the drawbacks are obvious. Would love if there's somehow an elegant back-and-forth between these approaches possible?
You can define a class that takes care of multiple particles:
class Particles(object):
def __init__(self, coords):
self.coords = coords
def __repr__(self):
return "Particles(coords={})".format(self.coords)
def apply_lateral_wind(self, dx, dy):
self.coords[:, 0] += dx
self.coords[:, 1] += dy
start_values = np.random.random((int(1e6), 3))
particles = Particles(start_values)
The timings on my systems shows that this is actually faster than your numpy version, presumably because it does not construct an additional array and does not have to worry about broadcasting:
%timeit particles.apply_lateral_wind(0.5, 1.2)
100 loops, best of 3: 3.17 ms per loop
whereas using your numpy example gives
%timeit start_values[..., :2] += np.array([0.5, 1.2])
10 loops, best of 3: 21.1 ms per loop
Cython extension types could be an interesting option if you really want to use an object instead of a bare NumPy array. (Although the also come with some limitations that are described in that linked documentation page.)
I edited your sample code a bit to Cythonize it, and came up with this:
cdef class Particle(object):
cdef double x, y, z
def __init__(self, double x=0, double y=0, double z=0):
self.x = x
self.y = y
self.z = z
def __repr__(self):
return "x={}\ny={}\nz={}".format(self.x, self.y, self.z)
cpdef apply_lateral_wind(self, double dx, double dy):
self.x += dx
self.y += dy
With this basic Cython version, I got the following timing when looping over 1 million particles:
>>> %timeit _ = [p.apply_lateral_wind(0.5, 1.2) for p in particles]
10 loops, best of 3: 102 ms per loop
This is about a factor of 5 faster than the plain Python version, which took 532 ms on the same machine. That being said, it's clearly still an order of magnitude slower than just using the bare NumPy array, but I guess that's the price of readability.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With