Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cython either marginally faster or slower than pure Python

I am using several techniques (NumPy, Weave and Cython) to perform a Python performance benchmark. What the code basically does mathematically is C = AB, where A, B and C are N x N matrices (NOTE: this is a matrix product and not an element-wise multiplication).

I have written 5 distinct implementations of the code:

  1. Pure python (Loop over 2D Python lists)
  2. NumPy (Dot product of 2D NumPy arrays)
  3. Weave inline (C++ loop over 2D arrays)
  4. Cython (Loop over 2D Python lists + static typing)
  5. Cython-Numpy (Loop over 2D NumPy arrays + static typing)

My expectation is that implementations 2 through 5 will be substantially faster than implementation 1. My results however indicate otherwise. These are my normalised speed-up results relative to the pure Python implementation:

  • python_list: 1.00
  • numpy_array: 330.09
  • weave_inline: 30.72
  • cython_list: 2.80
  • cython_array: 0.14

I am quite happy with the performance of NumPy, however I am less enthusiastic about Weave's performance and Cython's performance makes me cry. My entire code is separated over two files. Everything is automated and you simply need to run the first file to see all results. Could someone please aid me by indicating what I could do to obtain better results?

matmul.py:

import time

import numpy as np
from scipy import weave
from scipy.weave import converters

import pyximport
pyximport.install()
import cython_matmul as cml


def python_list_matmul(A, B):
    C = np.zeros(A.shape, dtype=float).tolist()
    A = A.tolist()
    B = B.tolist()
    for k in xrange(len(A)):
        for i in xrange(len(A)):
            for j in xrange(len(A)):
                C[i][k] += A[i][j] * B[j][k]
    return C


def numpy_array_matmul(A, B):
    return np.dot(A, B)


def weave_inline_matmul(A, B):
    code = """
       int i, j, k;
       for (k = 0; k < N; ++k)
       {
           for (i = 0; i < N; ++i)
           {
               for (j = 0; j < N; ++j)
               {
                   C(i, k) += A(i, j) * B(j, k);
               }
           }
       }
       """

    C = np.zeros(A.shape, dtype=float)
    weave.inline(code, ['A', 'B', 'C', 'N'], type_converters=converters.blitz, compiler='gcc')
    return C


N = 100
A = np.random.rand(N, N)
B = np.random.rand(N, N)

function = []
function.append([python_list_matmul, 'python_list'])
function.append([numpy_array_matmul, 'numpy_array'])
function.append([weave_inline_matmul, 'weave_inline'])
function.append([cml.cython_list_matmul, 'cython_list'])
function.append([cml.cython_array_matmul, 'cython_array'])

t = []
for i in xrange(len(function)):
    t1 = time.time()
    C = function[i][0](A, B)
    t2 = time.time()
    t.append(t2 - t1)
    print function[i][1] + ' \t: ' + '{:10.6f}'.format(t[0] / t[-1])

cython_matmul.pyx:

import numpy as np
cimport numpy as np

import cython
cimport cython

DTYPE = np.float
ctypedef np.float_t DTYPE_t


@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cpdef cython_list_matmul(A, B):

    cdef int i, j, k
    cdef int N = len(A)

    A = A.tolist()
    B = B.tolist()
    C = np.zeros([N, N]).tolist()

    for k in xrange(N):
        for i in xrange(N):
            for j in xrange(N):
                C[i][k] += A[i][j] * B[j][k]
    return C


@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
cpdef cython_array_matmul(np.ndarray[DTYPE_t, ndim=2] A, np.ndarray[DTYPE_t, ndim=2] B):

    cdef int i, j, k, N = A.shape[0]
    cdef np.ndarray[DTYPE_t, ndim=2] C = np.zeros([N, N], dtype=DTYPE)

    for k in xrange(N):
        for i in xrange(N):
            for j in xrange(N):
                C[i][k] += A[i][j] * B[j][k]
    return C
like image 736
Aeronaelius Avatar asked Dec 21 '22 04:12

Aeronaelius


1 Answers

Python lists and high performance math are incompatible, forget about cython_list_matmul.

The only problem with your cython_array_matmul is incorrect usage of indexing. It should be

C[i,k] += A[i,j] * B[j,k]

That's how numpy arrays are indexed in Python and that's the syntax Cython optimizes. With this change you should get decent performance.

Cython's annotation feature is really helpful in spotting optimization problems like this one. You could notice that A[i][j] produces a ton of Python API calls, while A[i,j] produces none.

Also, if you initialize all entries by hand, np.empty is more appropriate than np.zeros.

like image 83
Nikita Nemkin Avatar answered Dec 26 '22 12:12

Nikita Nemkin