Suppose I have an array <pre class="prettyprint"><code>from array import array myarr = array('l', [1, 2, 3]) </code></pre> and a variable: <code>myvar = 4</code> what is the fastest way to create a new array: <pre class="prettyprint"><code>newarray = array('l', [1, 2, 3, 4]) </code></pre> You can assume all elements are of 'long' type I have tried to create a new array and use <code>array.append()</code> not sure if it is fastest. I was thinking of using <code>memoryview</code> like: <code>malloc(4*sizeof(long))</code> but I don't know how to copy a shorter array into part of the memoryview. then insert last element into last position. I am fairly new to Cython. Thanks for any help! Update: I compared the following three methods: Cython: [100000 loops, best of 3: 5.94 µs per loop] <pre class="prettyprint"><code>from libc.stdlib cimport malloc def cappend(long[:] arr, long var, size_t N): cdef long[:] result = <long[:(N+1)]>malloc((N+1)*sizeof(long)) result.base[:N] = arr result.base[N] = var return result </code></pre> array: [1000000 loops, best of 3: 1.21 µs per loop] <pre class="prettyprint"><code>from array import array import copy def pyappend(arr, x): result = copy.copy(arr) result.append(x) return result </code></pre> list append: [1000000 loops, best of 3: 480 ns per loop] <pre class="prettyprint"><code>def pylistappend(lst, x): result = lst[:] result.append(x) return result </code></pre> is there hope to improve the cython part and beat the array one?

Cython gives us more access to the internals of <code>array.array</code> than the "normal" python, so we can utilize it to speed up the code: <ol> <li>almost by factor <code>7</code> for your small example (by eliminating most of the overhead).</li> <li>by factor <code>2</code> for larger inputs by eliminating an unnecessary array-copy.</li> </ol> Read on for more details. <hr> It's a little bit unusual to try to optimize a function for such small input, but not without (at least theoretical) interest. So let's start with your functions as baseline: <pre class="prettyprint"><code>a=array('l', [1,2,3]) %timeit pyappend(a, 8) 1.03 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) lst=[1,2,3] %timeit pylistappend(lst, 8) 279 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) </code></pre> We must be aware: what we are measuring is not the cost of copying but the cost of overhead (python interpreter, calling functions and so on), for example there is no difference whether <code>a</code> has 3 or 5 elements: <pre class="prettyprint"><code>a=array('l', range(5)) %timeit pyappend(a, 8) 1.03 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) </code></pre> In the array-version, we have more overhead because we have indirection via <code>copy</code> module, we can try to eliminate that: <pre class="prettyprint"><code>def pyappend2(arr, x): result = array('l',arr) result.append(x) return result %timeit pyappend2(a, 8) 496 ns ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) </code></pre> That is faster. Now let's use cython - this would eliminate the interpreter costs: <pre class="prettyprint"><code> %%cython def cylistappend(lst, x): result = lst[:] result.append(x) return result %%cython from cpython cimport array def cyappend(array.array arr, long long int x): cdef array.array res = array.array('l', arr) res.append(x) return res %timeit cylistappend(lst, 8) 193 ns ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) %%timeit cyappend(a, 8) 421 ns ± 8.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) </code></pre> The cython versions is about 33% faster for <code>list</code> and about 10% faster for <code>array</code>. The constructor <code>array.array()</code> expects an iterable, but we already have an <code>array.array</code>, so we use the functionality from <code>cpython</code> to get access to internals of the <code>array.array</code> object and improve the situation a little: <pre class="prettyprint"><code> %%cython from cpython cimport array def cyappend2(array.array arr, long long int x): cdef array.array res = array.copy(arr) res.append(x) return res %timeit cyappend2(a, 8) 305 ns ± 7.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) </code></pre> For the next step we need to know how <code>array.array</code> appends elements: Normally, it over-allocates, so <code>append()</code> has amortized cost <code>O(1)</code>, however after <code>array.copy</code> the new array is exactly the needed number of elements and the next <code>append</code> invokes reallocation. We need to change that (see here for the description of the used functions): <pre class="prettyprint"><code> %%cython from cpython cimport array from libc.string cimport memcpy def cyappend3(array.array arr, long long int x): cdef Py_ssize_t n=len(arr) cdef array.array res = array.clone(arr,n+1,False) memcpy(res.data.as_voidptr, arr.data.as_voidptr, 8*n)#that is pretty sloppy.. res.data.as_longlongs[n]=x return res %timeit cyappend3(a, 8) 154 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) </code></pre> Similar to your function, the memory is over-allocated, so we don't need to call <code>resize()</code> any longer. Now we are faster than the <code>list</code> and almost 7 times faster than the original python version. <hr> Let's compare timings for bigger array sizes (<code>a=array('l',range(1000))</code>, <code>lst=list(range(1000))</code>, where copying of the data makes most of the running time: <pre class="prettyprint"><code> pyappend 1.84 µs #copy-module is slow! pyappend2 1.02 µs cyappend 0.94 µs #cython no big help - we are copying twice cyappend2 0.90 µs #still copying twice cyappend3 0.43 µs #copying only once -> twice as fast! pylistappend 4.09 µs # needs to increment refs of integers cylistappend 3.85 µs # the same as above </code></pre> Now, eliminating the unnecessary copy for <code>array.array</code> gives us the expected factor 2. <hr> For even bigger arrays (<code>10000</code> elements), we see the following: <pre class="prettyprint"><code> pyappend 6.9 µs #copy-module is slow! pyappend2 4.8 µs cyappend2 4.4 µs cyappend3 4.4 µs </code></pre> There is no longer a difference between the versions (if one discards the slow copy-module). The reason for this is the changed behavior of the <code>array.array</code> for such big number of elements: when copying it over-allocates thus avoiding the reallocation after the first <code>append()</code>. We can easily check it: <pre class="prettyprint"><code>b=array('l', array('l', range(10**3)))#emulate our functions b.buffer_info() [] (94481422849232, 1000) b.append(1) b.buffer_info() [] (94481422860352, 1001) # another pointer address -> reallocated ... b=array('l', array('l', range(10**4))) b.buffer_info() [](94481426290064, 10000) b.append(33) b.buffer_info() [](94481426290064, 10001) # the same pointer address -> no reallocation! </code></pre>

What is the fastest way in Cython to create a new array from an existing array and a variable

Tags:

python

arrays

cpython

cython

Suppose I have an array

from array import array
myarr = array('l', [1, 2, 3])

and a variable: myvar = 4 what is the fastest way to create a new array:

newarray = array('l', [1, 2, 3, 4])

You can assume all elements are of 'long' type

I have tried to create a new array and use array.append() not sure if it is fastest. I was thinking of using memoryview like: malloc(4*sizeof(long)) but I don't know how to copy a shorter array into part of the memoryview. then insert last element into last position.

I am fairly new to Cython. Thanks for any help!

Update: I compared the following three methods:

Cython: [100000 loops, best of 3: 5.94 µs per loop]

from libc.stdlib cimport malloc

def cappend(long[:] arr, long var, size_t N):
    cdef long[:] result = <long[:(N+1)]>malloc((N+1)*sizeof(long))
    result.base[:N] = arr
    result.base[N] = var
    return result

array: [1000000 loops, best of 3: 1.21 µs per loop]

from array import array
import copy
def pyappend(arr, x):
    result = copy.copy(arr)
    result.append(x)
    return result

list append: [1000000 loops, best of 3: 480 ns per loop]

def pylistappend(lst, x):
    result = lst[:]
    result.append(x)
    return result

is there hope to improve the cython part and beat the array one?

488

asked Oct 27 '17 17:10

Weiwen Gu

1 Answers

Cython gives us more access to the internals of array.array than the "normal" python, so we can utilize it to speed up the code:

almost by factor 7 for your small example (by eliminating most of the overhead).
by factor 2 for larger inputs by eliminating an unnecessary array-copy.

Read on for more details.

It's a little bit unusual to try to optimize a function for such small input, but not without (at least theoretical) interest.

So let's start with your functions as baseline:

a=array('l', [1,2,3])
%timeit pyappend(a, 8)
1.03 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

lst=[1,2,3]
%timeit pylistappend(lst, 8)
279 ns ± 6.03 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

We must be aware: what we are measuring is not the cost of copying but the cost of overhead (python interpreter, calling functions and so on), for example there is no difference whether a has 3 or 5 elements:

a=array('l', range(5))
%timeit pyappend(a, 8)
1.03 µs ± 6.76 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In the array-version, we have more overhead because we have indirection via copy module, we can try to eliminate that:

def pyappend2(arr, x): 
      result = array('l',arr)                   
      result.append(x)                               
      return result

%timeit pyappend2(a, 8)
496 ns ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

That is faster. Now let's use cython - this would eliminate the interpreter costs:

 %%cython
 def cylistappend(lst, x):      
     result = lst[:]                                  
     result.append(x)                            
     return result

 %%cython
 from cpython cimport array
 def cyappend(array.array arr, long long int x):
      cdef array.array res = array.array('l', arr)
      res.append(x)
      return res   

 %timeit cylistappend(lst, 8)
 193 ns ± 12.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
 %%timeit cyappend(a, 8)
 421 ns ± 8.08 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

The cython versions is about 33% faster for list and about 10% faster for array. The constructor array.array() expects an iterable, but we already have an array.array, so we use the functionality from cpython to get access to internals of the array.array object and improve the situation a little:

 %%cython
 from cpython cimport array
 def cyappend2(array.array arr, long long int x):
     cdef array.array res = array.copy(arr)
     res.append(x)
     return res

 %timeit cyappend2(a, 8)
 305 ns ± 7.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For the next step we need to know how array.array appends elements: Normally, it over-allocates, so append() has amortized cost O(1), however after array.copy the new array is exactly the needed number of elements and the next append invokes reallocation. We need to change that (see here for the description of the used functions):

  %%cython
  from cpython cimport array
  from libc.string cimport memcpy
  def cyappend3(array.array arr, long long int x):
           cdef Py_ssize_t n=len(arr)
           cdef array.array res = array.clone(arr,n+1,False)
           memcpy(res.data.as_voidptr, arr.data.as_voidptr, 8*n)#that is pretty sloppy..
           res.data.as_longlongs[n]=x
           return res

 %timeit cyappend3(a, 8)
 154 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Similar to your function, the memory is over-allocated, so we don't need to call resize() any longer. Now we are faster than the list and almost 7 times faster than the original python version.

Let's compare timings for bigger array sizes (a=array('l',range(1000)), lst=list(range(1000)), where copying of the data makes most of the running time:

  pyappend             1.84 µs  #copy-module is slow!
  pyappend2            1.02 µs
  cyappend             0.94 µs  #cython no big help - we are copying twice
  cyappend2            0.90 µs  #still copying twice
  cyappend3            0.43 µs  #copying only once -> twice as fast!

  pylistappend         4.09 µs # needs to increment refs of integers
  cylistappend         3.85 µs # the same as above

Now, eliminating the unnecessary copy for array.array gives us the expected factor 2.

For even bigger arrays (10000 elements), we see the following:

  pyappend             6.9 µs  #copy-module is slow!
  pyappend2            4.8 µs
  cyappend2            4.4 µs  
  cyappend3            4.4 µs

There is no longer a difference between the versions (if one discards the slow copy-module). The reason for this is the changed behavior of the array.array for such big number of elements: when copying it over-allocates thus avoiding the reallocation after the first append().

We can easily check it:

b=array('l', array('l', range(10**3)))#emulate our functions
b.buffer_info()
[] (94481422849232, 1000)
b.append(1)
b.buffer_info()
[] (94481422860352, 1001) # another pointer address -> reallocated
...
b=array('l', array('l', range(10**4)))
b.buffer_info()
[](94481426290064, 10000)
b.append(33)
b.buffer_info()
[](94481426290064, 10001) # the same pointer address -> no reallocation!

answered Nov 15 '22 08:11

ead

Related questions
                            
                                Spark DataFrame operators (nunique, multiplication)
                            
                                Count total number of pages in .TIF file in Python
                            
                                How to run same application in new run tool window? Pycharm Community 2017
                            
                                How to configure bokeh plot to have responsive width and fixed height
                            
                                Making your own set of MNIST data (identical to MNIST format)
                            
                                Reshape 4D numpy array into 3D
                            
                                p_value is 0 when I use scipy.stats.kstest() for large dataset
                            
                                Cython: create C function that returns an array
                            
                                Restart thread after exception on Python
                            
                                Detect that 2 string are same but in different order
                            
                                Python PIL Fill an image with a color outline
                            
                                Using transform to add a count of duplicate rows on certain columns - Pandas
                            
                                Send keys not working selenium webdriver python
                            
                                Why is a tuple containing an unhashable type unhashable?
                            
                                Why is setuptools not installing my "data files" named in MANIFEST.in?
                            
                                How can I convert a list of lists in a Dataframe in Pyspark, being each list the values of each attribute?
                            
                                django-simple-history, displaying changed fields in admin
                            
                                Bokeh FixedTicker with Custom Datetime/Timestamp values
                            
                                Run django api from postman: CSRF verification failed
                            
                                Run tasks asynchrounous with Python 3.6 asyncio

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With