Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine class design and matrix math efficiently?

For a while now I'm mentally plagued by the clash of two design philosophies for physical systems modeling and I'm wondering what kind of solutions the community came up for this.

For complex(er) simulations, I love the abstraction of creating classes for objects and how object instances of classes can be identified with the real objects that I want to study, and how certain attributes of the object represent physical features of the real life objects. Let's take ballistic particle systems as a simple example:

class Particle(object):
    def __init__(self, x=0, y=0, z=0):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return "x={}\ny={}\nz={}".format(self.x, self.y, self.z)
    def apply_lateral_wind(self, dx, dy):
        self.x += dx
        self.y += dy

If I initialize this with a million values I might do this:

start_values = np.random.random((int(1e6),3))

particles = [Particle(*i) for i in start_values]

Now, let's assume that I need to do a specific thing to all my particles, like adding a lateral wind vector, just resulting in a x,y shift for this specific operation, as I have just a bunch(list) of all my particles, I would need to loop over all my particles to do that and it takes this amount of time:

%timeit _ = [p.apply_lateral_wind(0.5, 1.2) for p in particles]
1 loop, best of 3: 551 ms per loop

Now, the opposing obvious paradigm for this, which is obviously more efficient, is to stay at numpy level and just do the mathematical operation directly onto the array, which is more than a factor of 10 faster:

%timeit start_values[...,:2] += np.array([0.5,1.2])
10 loops, best of 3: 20.3 ms per loop

My question now is, if there are any design patterns that efficiently combine these two approaches so that one wouldn't lose that much efficiency? I find it personally really easier to think in terms of object methods and attributes, it's that much clearer in my head, and for me also really the underlying reason for the success of object-oriented programming (or it's use in (physical) modeling). But the drawbacks are obvious. Would love if there's somehow an elegant back-and-forth between these approaches possible?

like image 709
K.-Michael Aye Avatar asked Feb 14 '16 23:02

K.-Michael Aye


2 Answers

You can define a class that takes care of multiple particles:

class Particles(object):
    def __init__(self, coords):
        self.coords = coords

    def __repr__(self):
        return "Particles(coords={})".format(self.coords)

    def apply_lateral_wind(self, dx, dy):
        self.coords[:, 0] += dx
        self.coords[:, 1] += dy

start_values = np.random.random((int(1e6), 3))        
particles = Particles(start_values)

The timings on my systems shows that this is actually faster than your numpy version, presumably because it does not construct an additional array and does not have to worry about broadcasting:

%timeit particles.apply_lateral_wind(0.5, 1.2)
100 loops, best of 3: 3.17 ms per loop

whereas using your numpy example gives

%timeit start_values[..., :2] += np.array([0.5, 1.2])
10 loops, best of 3: 21.1 ms per loop
like image 82
David Zwicker Avatar answered Oct 11 '22 06:10

David Zwicker


Cython extension types could be an interesting option if you really want to use an object instead of a bare NumPy array. (Although the also come with some limitations that are described in that linked documentation page.)

I edited your sample code a bit to Cythonize it, and came up with this:

cdef class Particle(object):
    cdef double x, y, z

    def __init__(self, double x=0, double y=0, double z=0):
        self.x = x
        self.y = y
        self.z = z
    def __repr__(self):
        return "x={}\ny={}\nz={}".format(self.x, self.y, self.z)
    cpdef apply_lateral_wind(self, double dx, double dy):
        self.x += dx
        self.y += dy

With this basic Cython version, I got the following timing when looping over 1 million particles:

>>> %timeit _ = [p.apply_lateral_wind(0.5, 1.2) for p in particles]
10 loops, best of 3: 102 ms per loop

This is about a factor of 5 faster than the plain Python version, which took 532 ms on the same machine. That being said, it's clearly still an order of magnitude slower than just using the bare NumPy array, but I guess that's the price of readability.

like image 37
jb326 Avatar answered Oct 11 '22 06:10

jb326