Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing 3D vectors in Python: numpy vs x,y,z fields

I am implementing a 3D Vector class in Python. My vector has coordinates x, y and z (all floats), and I need to decide how to store this information. I can see at least three options here:

1) Make three separate float fields: self.x, self.y, self.z

class Vector:

  def __init__(self, x, y, z):
    self.x = x
    self.y = y
    self.z = z

2) Make a list, say self.data, with three elements. I may also use a tuple if the objects can be constant.

class Vector:

  def __init__(self, x, y, z):
    self.data = [x,y,z]

3) Make a numpy array, say self.data, with three elements.

import numpy as np    

class Vector:

  def __init__(self, x, y, z):
    self.data = np.array([x,y,z])

For options (2) and (3), I could then implement properties and setters to access the single coordinates

@property
def x(self):
  return self.data[0]

4) Why not having some redundancy? I could have both a list (or tuple, or numpy array) and separate fields x, y and z.

The class is meant to be used to perform common operations such as vector addition, inner product, cross product, rotation, etc. Performance of these operations needs to be taken into account.

Is there a solution that I should prefer, and why?

like image 509
Antonio Avatar asked Apr 21 '17 11:04

Antonio


2 Answers

There are different aspects to this question and I can give you some hints on how these could be resolved. Note that these are meant as suggestions, you definetly need to see which one you like most.

Supporting linear algebra

You mentioned that you want to support linear algebra, such as vector addition (element-wise addition), cross product and inner product. These are avaiable for numpy.ndarrays so you could choose different approaches to supporting them:

  1. Simply use a numpy.ndarray and don't bother about your own class:

    import numpy as np
    vector1, vector2 = np.array([1, 2, 3]), np.array([3, 2, 1])
    np.add(vector1, vector2)      # vector addition
    np.cross(vector1, vector2)    # cross product
    np.inner(vector1, vector2)    # inner product
    

    There's no builtin vector rotation defined in numpy but there are several sources avaiable, for example "Rotation of 3D vector". So you would need to implement it yourself.

  2. You can create a class, independant of _how you store your attributes and provide an __array__ method. That way you can support (all) numpy functions as if your instances were numpy.ndarrays themselves:

    class VectorArrayInterface(object):
        def __init__(self, x, y, z):
            self.x, self.y, self.z = x, y, z
    
        def __array__(self, dtype=None):
            if dtype:
                return np.array([self.x, self.y, self.z], dtype=dtype)
            else:
                return np.array([self.x, self.y, self.z])
    
    vector1, vector2 = VectorArrayInterface(1, 2, 3), VectorArrayInterface(3, 2, 1)
    np.add(vector1, vector2)      # vector addition
    np.cross(vector1, vector2)    # cross product
    np.inner(vector1, vector2)    # inner product
    

    This will return the same results as in the first case so you can provide an interface for numpy functions without having a numpy-array. If you have a numpy-array stored in your class the __array__ method can simply return it so this could be an argument for storing your x, y and z as numpy.ndarray internally (because that's basically "for free").

  3. You can subclass np.ndarray. I won't go into the details here because that's an advanced topic that could easily justify a whole answer by itself. If you really consider this then you should have a look at the official documentation for "Subclassing ndarray". I don't recommend it, I worked on several classes that do subclass np.ndarray and there are several "rough egdes" down that path.

  4. You can implement the operations you need yourself. That's reinventing the wheel but it's educational and fun - if there's only a handful of them. I wouldn't recommend this for serious production because here as well are several "rough edges" that have been adressed in the numpy functions already. For example overflow or underflow issues, correctness of the functions, ...

    A possible implementation (not including rotation) could look like this (this time with an internally stored list):

    class VectorList(object):
        def __init__(self, x, y, z):
            self.vec = [x, y, z]
    
        def __repr__(self):
            return '{self.__class__.__name__}(x={self.vec[0]}, y={self.vec[1]}, z={self.vec[2]})'.format(self=self)
    
        def __add__(self, other):
            x1, y1, z1 = self.vec
            x2, y2, z2 = other.vec
            return VectorList(x1+x2, y1+y2, z1+z2)
    
        def crossproduct(self, other):
            x1, y1, z1 = self.vec
            x2, y2, z2 = other.vec
            return VectorList(y1*z2 - z1*y2,
                              z1*x2 - x1*z2,
                              x1*y2 - y1*x1)
    
        def scalarproduct(self, other):
            x1, y1, z1 = self.vec
            x2, y2, z2 = other.vec
            return x1*x2 + y1*y2 + z1*z2
    

    Note: You can implement these can-coded methods and implement the __array__ method I mentioned earlier. That way you can support any function expecting a numpy.ndarray and also have your homegrown methods. These approaches are not exclusive but you will have different results, the methods above return a scalar or a Vector but if you go through __array__ you'll get numpy.ndarrays back.

  5. Use a library containing a 3D vector. In some sense this is the easiest way in other aspects it could be very complicated. On the plus side an existing class will probably work out of the box and it's probably optimized in terms of performance. On the other hand you need to find an implementation that supports your use-case, you need to read the documentation (or figure out how it works by other means) and you could hit bugs or limitations that turn out to be desatrous for your project. Ah, and you get an additional dependency and you need to check if the license is compatible with your project. Additionally if you copy the implementation (check if the license allows that!) you need to maintain (even if it's just sync'ing) foreign code.

Performance

Performance is tricky in this case, the mentioned use-cases are quite simple and each task should be of the order of microseconds - so you should be able to perform several thousand to million operations per second already. Assuming you don't introduce an unnecessary bottleneck! However you can micro-optimize the operations.

Let me start with some general tipps:

  • Avoid numpy.ndarray <-> list/float operations. These are costly! If most of the operations use numpy.ndarrays you don't want to store your values in a list or as seperate attributes. Likewise if you want to access the individual values of the Vector or iterate over these values or perform operations on them as list then store them as list or seperate attributes.

  • Using numpy to operate on three values is relativly inefficient. numpy.ndarray is great for big array because it can store the values more efficiently (space) and scales much better than pure-python operations. However these advantages have some overhead that is significant for small arrays (say length << 100, that's an educated guess, not a fixed number!). A python solution (I use the one I already presented above) can be much faster than a numpy solution for such small arrays:

    class VectorArray:
        def __init__(self, x, y, z):
            self.data = np.array([x,y,z])
    
    # addition: python solution 3 times faster
    %timeit VectorList(1, 2, 3) + VectorList(3, 2, 1)  
    # 100000 loops, best of 3: 9.48 µs per loop
    %timeit VectorArray(1, 2, 3).data + VectorArray(3, 2, 1).data  
    # 10000 loops, best of 3: 35.6 µs per loop
    
    # cross product: python solution 16 times faster
    v = Vector(1, 2, 3)
    a = np.array([1,2,3])  # using a plain array to avoid the class-overhead
    %timeit v.crossproduct(v)  
    # 100000 loops, best of 3: 5.27 µs per loop
    %timeit np.cross(a, a)     
    # 10000 loops, best of 3: 84.9 µs per loop
    
    # inner product: python solution 4 times faster
    %timeit v.scalarproduct(v)  
    # 1000000 loops, best of 3: 1.3 µs per loop
    %timeit np.inner(a, a)      
    # 100000 loops, best of 3: 5.11 µs per loop
    

    However like I said these timings are of the order of microseconds so this is is literally micro-optimizing. However if your focus is on optimal performance of your class then you can be faster with pure-python and self-implemented functions.

    As soon as you try to do a lot of linear algebra operations you should leverage numpys vectorized operations. Most of these are incompatible with a class such as you describe and a completly different approach might be appropriate: For example a class that stores an array of array-vectors (a multidimensional array) in a way that interfaces correctly with numpys functions! But I think that's out of scope for this answer and wouldn't really answer your question which was limited to a class only storing 3 values.

  • I did some benchmarks using the same method with different approaches but that's a bit cheating. In general you shouldn't time one function call, you should measure the execution time of a program. In programs a tiny speed difference in a function that is called millions of times can make a much bigger overall difference than a big speed difference in a method that is only called a few times.... or not! I can only provide timings for functions because you haven't shared your program or use-cases so you need to find out which approach works best (correctness and performance) for you.

Conclusion

There are several other factors to consider which approach would be best, but these are more "meta"-reasons, not directly related to your program.

  • Re-inventing the wheel (implementing the functions yourself) is an opportunity to learn. You need to make sure it works correctly, you can time it and if it's too slow you can try different ways to optimize it. You start thinking about algorithmic complexities, constant factors, correctness, ... instead of thinking about "which function will solve my issue" or "how do I make that numpy-function correctly solve my issue".

  • Using NumPy for length-3 arrays is probably like "shooting with cannons at flies" but it is a great opportunity to become more familiar with numpy functions and in the future you'll know more about how NumPy works (vectorization, indexing, broadcasting, ...) even if NumPy wouldn't be a good fit for this question and answer.

  • Try different approaches and see how far you get. I learned a lot while answering this question and it was fun to try the approaches - compare the results for discrepancies, timing the method calls and evaluating their limitations!

like image 166
MSeifert Avatar answered Oct 26 '22 12:10

MSeifert


Taking into considerations of the use of the class Vector, I'd prefer having option-3. Since it yields numpy arrays, the vector operations are relatively easy, intuitive, and fast by using numpy.

In [81]: v1 = Vector(1.0, 2.0, 3.0)

In [82]: v2 = Vector(0.0, 1.0, 2.0)

In [83]: v1.data + v2.data
Out[83]: array([1.0, 3.0, 5.0])

In [85]: np.inner(v1.data, v2.data)
Out[85]: 8.0

These operations are already well optimized in numpy for performance.

like image 45
kmario23 Avatar answered Oct 26 '22 12:10

kmario23