Suppose I have an array
from array import array
myarr = array('l', [1, 2, 3])
and a variable:
myvar = 4
what is the fastest way to create a new array:
newarray = array('l', [1, 2, 3, 4])
You can assume all elements are of 'long' type
I have tried to create a new array and use array.append()
not sure if it is fastest. I was thinking of using memoryview
like:
malloc(4*sizeof(long))
but I don't know how to copy a shorter array into part of the memoryview. then insert last element into last position.
I am fairly new to Cython. Thanks for any help!
Update: I compared the following three methods:
Cython: [100000 loops, best of 3: 5.94 µs per loop]
from libc.stdlib cimport malloc
def cappend(long[:] arr, long var, size_t N):
cdef long[:] result = <long[:(N+1)]>malloc((N+1)*sizeof(long))
result.base[:N] = arr
result.base[N] = var
return result
array: [1000000 loops, best of 3: 1.21 µs per loop]
from array import array
import copy
def pyappend(arr, x):
result = copy.copy(arr)
result.append(x)
return result
list append: [1000000 loops, best of 3: 480 ns per loop]
def pylistappend(lst, x):
result = lst[:]
result.append(x)
return result
is there hope to improve the cython part and beat the array one?
Notice that here we're using the Python NumPy, imported using the import numpy statement. By running the above code, Cython took just 0.001 seconds to complete. For Python, the code took 0.003 seconds. Cython is nearly 3x faster than Python in this case.
NumPy is fast because it can do all its calculations without calling back into Python. Since this function involves looping in Python, we lose all the performance benefits of using NumPy. For a 10,000,000-entry NumPy array, this functions takes 2.5 seconds to run on my computer.
You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays.
Cython gives us more access to the internals of array.array
than the "normal" python, so we can utilize it to speed up the code:
7
for your small example (by eliminating most of the overhead).2
for larger inputs by eliminating an unnecessary array-copy.Read on for more details.
It's a little bit unusual to try to optimize a function for such small input, but not without (at least theoretical) interest.
So let's start with your functions as baseline:
a=array('l', [1,2,3])
%timeit pyappend(a, 8)
1.03 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
lst=[1,2,3]
%timeit pylistappend(lst, 8)
279 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
We must be aware: what we are measuring is not the cost of copying but the cost of overhead (python interpreter, calling functions and so on), for example there is no difference whether a
has 3 or 5 elements:
a=array('l', range(5))
%timeit pyappend(a, 8)
1.03 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In the array-version, we have more overhead because we have indirection via copy
module, we can try to eliminate that:
def pyappend2(arr, x):
result = array('l',arr)
result.append(x)
return result
%timeit pyappend2(a, 8)
496 ns ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
That is faster. Now let's use cython - this would eliminate the interpreter costs:
%%cython
def cylistappend(lst, x):
result = lst[:]
result.append(x)
return result
%%cython
from cpython cimport array
def cyappend(array.array arr, long long int x):
cdef array.array res = array.array('l', arr)
res.append(x)
return res
%timeit cylistappend(lst, 8)
193 ns ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%%timeit cyappend(a, 8)
421 ns ± 8.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
The cython versions is about 33% faster for list
and about 10% faster for array
. The constructor array.array()
expects an iterable, but we already have an array.array
, so we use the functionality from cpython
to get access to internals of the array.array
object and improve the situation a little:
%%cython
from cpython cimport array
def cyappend2(array.array arr, long long int x):
cdef array.array res = array.copy(arr)
res.append(x)
return res
%timeit cyappend2(a, 8)
305 ns ± 7.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
For the next step we need to know how array.array
appends elements: Normally, it over-allocates, so append()
has amortized cost O(1)
, however after array.copy
the new array is exactly the needed number of elements and the next append
invokes reallocation. We need to change that (see here for the description of the used functions):
%%cython
from cpython cimport array
from libc.string cimport memcpy
def cyappend3(array.array arr, long long int x):
cdef Py_ssize_t n=len(arr)
cdef array.array res = array.clone(arr,n+1,False)
memcpy(res.data.as_voidptr, arr.data.as_voidptr, 8*n)#that is pretty sloppy..
res.data.as_longlongs[n]=x
return res
%timeit cyappend3(a, 8)
154 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Similar to your function, the memory is over-allocated, so we don't need to call resize()
any longer. Now we are faster than the list
and almost 7 times faster than the original python version.
Let's compare timings for bigger array sizes (a=array('l',range(1000))
, lst=list(range(1000))
, where copying of the data makes most of the running time:
pyappend 1.84 µs #copy-module is slow!
pyappend2 1.02 µs
cyappend 0.94 µs #cython no big help - we are copying twice
cyappend2 0.90 µs #still copying twice
cyappend3 0.43 µs #copying only once -> twice as fast!
pylistappend 4.09 µs # needs to increment refs of integers
cylistappend 3.85 µs # the same as above
Now, eliminating the unnecessary copy for array.array
gives us the expected factor 2.
For even bigger arrays (10000
elements), we see the following:
pyappend 6.9 µs #copy-module is slow!
pyappend2 4.8 µs
cyappend2 4.4 µs
cyappend3 4.4 µs
There is no longer a difference between the versions (if one discards the slow copy-module). The reason for this is the changed behavior of the array.array
for such big number of elements: when copying it over-allocates thus avoiding the reallocation after the first append()
.
We can easily check it:
b=array('l', array('l', range(10**3)))#emulate our functions
b.buffer_info()
[] (94481422849232, 1000)
b.append(1)
b.buffer_info()
[] (94481422860352, 1001) # another pointer address -> reallocated
...
b=array('l', array('l', range(10**4)))
b.buffer_info()
[](94481426290064, 10000)
b.append(33)
b.buffer_info()
[](94481426290064, 10001) # the same pointer address -> no reallocation!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With