Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way in Cython to create a new array from an existing array and a variable

Suppose I have an array

from array import array
myarr = array('l', [1, 2, 3])

and a variable: myvar = 4 what is the fastest way to create a new array:

newarray = array('l', [1, 2, 3, 4])

You can assume all elements are of 'long' type

I have tried to create a new array and use array.append() not sure if it is fastest. I was thinking of using memoryview like: malloc(4*sizeof(long)) but I don't know how to copy a shorter array into part of the memoryview. then insert last element into last position.

I am fairly new to Cython. Thanks for any help!

Update: I compared the following three methods:

Cython: [100000 loops, best of 3: 5.94 µs per loop]

from libc.stdlib cimport malloc

def cappend(long[:] arr, long var, size_t N):
    cdef long[:] result = <long[:(N+1)]>malloc((N+1)*sizeof(long))
    result.base[:N] = arr
    result.base[N] = var
    return result

array: [1000000 loops, best of 3: 1.21 µs per loop]

from array import array
import copy
def pyappend(arr, x):
    result = copy.copy(arr)
    result.append(x)
    return result

list append: [1000000 loops, best of 3: 480 ns per loop]

def pylistappend(lst, x):
    result = lst[:]
    result.append(x)
    return result

is there hope to improve the cython part and beat the array one?

like image 488
Weiwen Gu Avatar asked Oct 27 '17 17:10

Weiwen Gu


People also ask

Is Cython faster than NumPy?

Notice that here we're using the Python NumPy, imported using the import numpy statement. By running the above code, Cython took just 0.001 seconds to complete. For Python, the code took 0.003 seconds. Cython is nearly 3x faster than Python in this case.

Why is NumPy fast?

NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.

Can you use NumPy in Cython?

You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays.


1 Answers

Cython gives us more access to the internals of array.array than the "normal" python, so we can utilize it to speed up the code:

  1. almost by factor 7 for your small example (by eliminating most of the overhead).
  2. by factor 2 for larger inputs by eliminating an unnecessary array-copy.

Read on for more details.


It's a little bit unusual to try to optimize a function for such small input, but not without (at least theoretical) interest.

So let's start with your functions as baseline:

a=array('l', [1,2,3])
%timeit pyappend(a, 8)
1.03 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

lst=[1,2,3]
%timeit pylistappend(lst, 8)
279 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

We must be aware: what we are measuring is not the cost of copying but the cost of overhead (python interpreter, calling functions and so on), for example there is no difference whether a has 3 or 5 elements:

a=array('l', range(5))
%timeit pyappend(a, 8)
1.03 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In the array-version, we have more overhead because we have indirection via copy module, we can try to eliminate that:

def pyappend2(arr, x): 
      result = array('l',arr)                   
      result.append(x)                               
      return result

%timeit pyappend2(a, 8)
496 ns ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

That is faster. Now let's use cython - this would eliminate the interpreter costs:

 %%cython
 def cylistappend(lst, x):      
     result = lst[:]                                  
     result.append(x)                            
     return result

 %%cython
 from cpython cimport array
 def cyappend(array.array arr, long long int x):
      cdef array.array res = array.array('l', arr)
      res.append(x)
      return res   

 %timeit cylistappend(lst, 8)
 193 ns ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
 %%timeit cyappend(a, 8)
 421 ns ± 8.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The cython versions is about 33% faster for list and about 10% faster for array. The constructor array.array() expects an iterable, but we already have an array.array, so we use the functionality from cpython to get access to internals of the array.array object and improve the situation a little:

 %%cython
 from cpython cimport array
 def cyappend2(array.array arr, long long int x):
     cdef array.array res = array.copy(arr)
     res.append(x)
     return res

 %timeit cyappend2(a, 8)
 305 ns ± 7.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For the next step we need to know how array.array appends elements: Normally, it over-allocates, so append() has amortized cost O(1), however after array.copy the new array is exactly the needed number of elements and the next append invokes reallocation. We need to change that (see here for the description of the used functions):

  %%cython
  from cpython cimport array
  from libc.string cimport memcpy
  def cyappend3(array.array arr, long long int x):
           cdef Py_ssize_t n=len(arr)
           cdef array.array res = array.clone(arr,n+1,False)
           memcpy(res.data.as_voidptr, arr.data.as_voidptr, 8*n)#that is pretty sloppy..
           res.data.as_longlongs[n]=x
           return res

 %timeit cyappend3(a, 8)
 154 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Similar to your function, the memory is over-allocated, so we don't need to call resize() any longer. Now we are faster than the list and almost 7 times faster than the original python version.


Let's compare timings for bigger array sizes (a=array('l',range(1000)), lst=list(range(1000)), where copying of the data makes most of the running time:

  pyappend             1.84 µs  #copy-module is slow!
  pyappend2            1.02 µs
  cyappend             0.94 µs  #cython no big help - we are copying twice
  cyappend2            0.90 µs  #still copying twice
  cyappend3            0.43 µs  #copying only once -> twice as fast!

  pylistappend         4.09 µs # needs to increment refs of integers
  cylistappend         3.85 µs # the same as above

Now, eliminating the unnecessary copy for array.array gives us the expected factor 2.


For even bigger arrays (10000 elements), we see the following:

  pyappend             6.9 µs  #copy-module is slow!
  pyappend2            4.8 µs
  cyappend2            4.4 µs  
  cyappend3            4.4 µs  

There is no longer a difference between the versions (if one discards the slow copy-module). The reason for this is the changed behavior of the array.array for such big number of elements: when copying it over-allocates thus avoiding the reallocation after the first append().

We can easily check it:

b=array('l', array('l', range(10**3)))#emulate our functions
b.buffer_info()
[] (94481422849232, 1000)
b.append(1)
b.buffer_info()
[] (94481422860352, 1001) # another pointer address -> reallocated
...
b=array('l', array('l', range(10**4)))
b.buffer_info()
[](94481426290064, 10000)
b.append(33)
b.buffer_info()
[](94481426290064, 10001) # the same pointer address -> no reallocation!
like image 86
ead Avatar answered Nov 15 '22 08:11

ead