Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting Algorithm Performance Optimization in Pypy vs Python (Numpy vs List)

My expectation was that pypy could be as much as an order of magnitude faster than python, but the results indicate that pypy is in fact slower than expected.

I have two questions:

  1. Why is pypy significantly slower with numpy?
  2. Is there anything I can do to optimize my algorithm to make pypy (or python) faster?

Resulting timings:

Python 2.7.5

  • # points: 16,777,216 (8 ** 3 * 32 ** 3)
  • Xrange time: 1487.15 ms
  • Xrange Numpy time: 2553.98 ms
  • Point Gen Time: 6162.23 ms
  • Numpy Gen Time: 13894.73 ms

Pypy 2.2.1

  • # points: 16,777,216 (8 ** 3 * 32 ** 3)
  • Xrange time: 129.48 ms
  • Xrange Numpy time: 4644.12 ms
  • Point Gen Time: 4643.82 ms
  • Numpy Gen Time: 44168.98 ms

Algorithm:

I am generating a list of points in space using a simple algorithm and I'm trying to optimize the algorithm.

def generate(size=32, point=(0, 0, 0), width=32):
    """
    generate points in space around a center point with a specific width and
      number of divisions (size)
    """
    X, Y, Z = point
    half = width * 0.5
    delta = width
    scale = width / size
    offset = scale * 0.5
    X = X + offset - half
    Y = Y + offset - half
    Z = Z + offset - half
    for x in xrange(size):
        x = (x * scale) + X
        for y in xrange(size):
            y = (y * scale) + Y
            for z in xrange(size):
                z = (z * scale) + Z
                yield (x, y, z)

In optimizing, I started looking at using pypy rather than python. And in comparing the two, I came up with a few different scenarios:

  • counting using xrange

    rsize = 8    # size of region
    csize = 32   # size of chunk
    number_of_points = rsize ** 3 * csize ** 3
    [x for x in xrange(number_of_points)]
    
  • counting using xrange with numpy

    rsize = 8    # size of region
    csize = 32   # size of chunk
    number_of_points = rsize ** 3 * csize ** 3
    np.array([x for x in xrange(number_of_points)])
    
  • running with the algorithm above

    rsize = 8    # size of region
    csize = 32   # size of chunk
    [p
     for rp in generate(size=rsize, width=rsize*csize)
     for p in generate(size=csize, width=csize, point=rp)]
    
  • running the algorithm above with numpy

    rsize = 8    # size of region
    csize = 32   # size of chunk
    np.array([p
     for rp in generate(size=rsize, width=rsize*csize)
     for p in generate(size=csize, width=csize, point=rp)])
    

Background:

I am attempting to create a voxel engine, and I want to optimize my algorithm to bring down generation times to a manageable level. While I clearly won't achieve anything close to Java/C++, I'd like to push python (or pypy) as far as I can.

I noticed that list lookups are significantly faster than dictionaries from some earlier timings. And lists are also faster than tuples (unexpected), though tuples generate faster. Numpy has even faster read times than non-numpy. However, numpy creation time can be orders of magnitude slower.

So if reading is most important, there is a clear advantage for using a numpy. If reading and creation are of equal importance, however, then a straight list is likely best. That said, I don't have a clean way of looking at memory usage, but I suspect that lists are far less memory efficient than tuples or numpy. Also, while it's a small difference, I found .get on a dictionary to be slightly faster than using the __ getitem __ call (i.e. dictionary[lookup] vs. dicitonary.get(lookup) )

timings ...

Python 2.7.5

Reading

 - Option 1: tuple access... 2045.51 ms
 - Option 2: tuple access (again)... 2081.97 ms    # sampling effect of cache
 - Option 3: list access... 2072.09 ms
 - Option 4: dict access... 3436.53 ms
 - Option 5: iterable creation... N/A
 - Option 6: numpy array... 1752.44 ms

Creation

 - Option 1: tuple creation... 690.36 ms
 - Option 2: tuple creation (again)... 716.49 ms    # sampling effect of cache
 - Option 3: list creation... 684.28 ms
 - Option 4: dict creation... 1498.94 ms
 - Option 5: iterable creation...  0.01 ms
 - Option 6: numpy creation... 3514.25 ms

Pypy 2.2.1

Reading

 - Option 1: tuple access... 243.34 ms
 - Option 2: tuple access (again)... 246.51 ms    # sampling effect of cache
 - Option 3: list access... 139.65 ms
 - Option 4: dict access... 454.65 ms
 - Option 5: iterable creation... N/A
 - Option 6: numpy array... 21.60 ms

Creation

 - Option 1: tuple creation... 1016.27 ms
 - Option 2: tuple creation (again)... 1063.50 ms    # sampling effect of cache
 - Option 3: list creation... 365.98 ms
 - Option 4: dict creation... 2258.44 ms
 - Option 5: iterable creation...  0.00 ms
 - Option 6: numpy creation... 12514.20 ms

In all the examples, a random lookup was generated for random data.

dsize = 10 ** 7   # or 10 million data points
data = [(i, random.random()*dsize)
         for i in range(dsize)]
lookup = tuple(int(random.random()*dsize) for i in range(dsize))

The loops were very simple:

for x in lookup:
    data_of_specific_type[x]

And the data_of_specific_type was a conversion of data into that type (e.g. tuple(data), list(data), etc.)

like image 646
Brian Bruggeman Avatar asked Nov 11 '22 07:11

Brian Bruggeman


1 Answers

Part of the issue is that:

np.array([p
    for rp in generate(size=rsize, width=rsize*csize)
    for p in generate(size=csize, width=csize, point=rp)])

Does all the work of creating the list and converting it to an np.array.

A faster way to do this would be:

arr = np.empty(size)
i = 0
for rp in generate(size=rsize, width=rsize*csize):
    for p in generate(size=csize, width=csize, point=rp):
        arr[i] = p
        i += 1
like image 200
Alex Gaynor Avatar answered Nov 15 '22 08:11

Alex Gaynor