The Cython documentation on typed memory views list three ways of assigning to a typed memory view: <ol> <li>from a raw C pointer,</li> <li>from a <code>np.ndarray</code> and</li> <li>from a <code>cython.view.array</code>.</li> </ol> Assume that I don't have data passed in to my cython function from outside but instead want to allocate memory and return it as a <code>np.ndarray</code>, which of those options do I chose? Also assume that the size of that buffer is not a compile-time constant i.e. I can't allocate on the stack, but would need to <code>malloc</code> for option 1. The 3 options would therefore looke something like this: <pre class="prettyprint"><code>from libc.stdlib cimport malloc, free cimport numpy as np from cython cimport view np.import_array() def memview_malloc(int N): cdef int * m = <int *>malloc(N * sizeof(int)) cdef int[::1] b = <int[:N]>m free(<void *>m) def memview_ndarray(int N): cdef int[::1] b = np.empty(N, dtype=np.int32) def memview_cyarray(int N): cdef int[::1] b = view.array(shape=(N,), itemsize=sizeof(int), format="i") </code></pre> What is surprising to me is that in all three cases, Cython generates quite a lot of code for the memory allocation, in particular a call to <code>__Pyx_PyObject_to_MemoryviewSlice_dc_int</code>. This suggests (and I might be wrong here, my insight into the inner workings of Cython are very limited) that it first creates a Python object and then "casts" it into a memory view, which seems unnecessary overhead. A simple benchmark doesn't reveal much difference between the three methods, with 2. being the fastest by a thin margin. Which of the three methods is recommended? Or is there a different, better option? Follow-up question: I want to finally return the result as a <code>np.ndarray</code>, after having worked with that memory view in the function. Is a typed memory view the best choice or would I rather just use the old buffer interface as below to create an <code>ndarray</code> in the first place? <pre class="prettyprint"><code>cdef np.ndarray[DTYPE_t, ndim=1] b = np.empty(N, dtype=np.int32) </code></pre>

Look here for an answer. The basic idea is that you want <code>cpython.array.array</code> and <code>cpython.array.clone</code> (not <code>cython.array.*</code>): <pre class="prettyprint"><code>from cpython.array cimport array, clone # This type is what you want and can be cast to things of # the "double[:]" syntax, so no problems there cdef array[double] armv, templatemv templatemv = array('d') # This is fast armv = clone(templatemv, L, False) </code></pre> EDIT It turns out that the benchmarks in that thread were rubbish. Here's my set, with my timings: <pre class="prettyprint"><code># cython: language_level=3 # cython: boundscheck=False # cython: wraparound=False import time import sys from cpython.array cimport array, clone from cython.view cimport array as cvarray from libc.stdlib cimport malloc, free import numpy as numpy cimport numpy as numpy cdef int loops def timefunc(name): def timedecorator(f): cdef int L, i print("Running", name) for L in [1, 10, 100, 1000, 10000, 100000, 1000000]: start = time.clock() f(L) end = time.clock() print(format((end-start) / loops * 1e6, "2f"), end=" ") sys.stdout.flush() print("μs") return timedecorator print() print("INITIALISATIONS") loops = 100000 @timefunc("cpython.array buffer") def _(int L): cdef int i cdef array[double] arr, template = array('d') for i in range(loops): arr = clone(template, L, False) # Prevents dead code elimination str(arr[0]) @timefunc("cpython.array memoryview") def _(int L): cdef int i cdef double[::1] arr cdef array template = array('d') for i in range(loops): arr = clone(template, L, False) # Prevents dead code elimination str(arr[0]) @timefunc("cpython.array raw C type") def _(int L): cdef int i cdef array arr, template = array('d') for i in range(loops): arr = clone(template, L, False) # Prevents dead code elimination str(arr[0]) @timefunc("numpy.empty_like memoryview") def _(int L): cdef int i cdef double[::1] arr template = numpy.empty((L,), dtype='double') for i in range(loops): arr = numpy.empty_like(template) # Prevents dead code elimination str(arr[0]) @timefunc("malloc") def _(int L): cdef int i cdef double* arrptr for i in range(loops): arrptr = <double*> malloc(sizeof(double) * L) free(arrptr) # Prevents dead code elimination str(arrptr[0]) @timefunc("malloc memoryview") def _(int L): cdef int i cdef double* arrptr cdef double[::1] arr for i in range(loops): arrptr = <double*> malloc(sizeof(double) * L) arr = <double[:L]>arrptr free(arrptr) # Prevents dead code elimination str(arr[0]) @timefunc("cvarray memoryview") def _(int L): cdef int i cdef double[::1] arr for i in range(loops): arr = cvarray((L,),sizeof(double),'d') # Prevents dead code elimination str(arr[0]) print() print("ITERATING") loops = 1000 @timefunc("cpython.array buffer") def _(int L): cdef int i cdef array[double] arr = clone(array('d'), L, False) cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) @timefunc("cpython.array memoryview") def _(int L): cdef int i cdef double[::1] arr = clone(array('d'), L, False) cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) @timefunc("cpython.array raw C type") def _(int L): cdef int i cdef array arr = clone(array('d'), L, False) cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) @timefunc("numpy.empty_like memoryview") def _(int L): cdef int i cdef double[::1] arr = numpy.empty((L,), dtype='double') cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) @timefunc("malloc") def _(int L): cdef int i cdef double* arrptr = <double*> malloc(sizeof(double) * L) cdef double d for i in range(loops): for i in range(L): d = arrptr[i] free(arrptr) # Prevents dead-code elimination str(d) @timefunc("malloc memoryview") def _(int L): cdef int i cdef double* arrptr = <double*> malloc(sizeof(double) * L) cdef double[::1] arr = <double[:L]>arrptr cdef double d for i in range(loops): for i in range(L): d = arr[i] free(arrptr) # Prevents dead-code elimination str(d) @timefunc("cvarray memoryview") def _(int L): cdef int i cdef double[::1] arr = cvarray((L,),sizeof(double),'d') cdef double d for i in range(loops): for i in range(L): d = arr[i] # Prevents dead-code elimination str(d) </code></pre> Output: <pre class="prettyprint"><code>INITIALISATIONS Running cpython.array buffer 0.100040 0.097140 0.133110 0.121820 0.131630 0.108420 0.112160 μs Running cpython.array memoryview 0.339480 0.333240 0.378790 0.445720 0.449800 0.414280 0.414060 μs Running cpython.array raw C type 0.048270 0.049250 0.069770 0.074140 0.076300 0.060980 0.060270 μs Running numpy.empty_like memoryview 1.006200 1.012160 1.128540 1.212350 1.250270 1.235710 1.241050 μs Running malloc 0.021850 0.022430 0.037240 0.046260 0.039570 0.043690 0.030720 μs Running malloc memoryview 1.640200 1.648000 1.681310 1.769610 1.755540 1.804950 1.758150 μs Running cvarray memoryview 1.332330 1.353910 1.358160 1.481150 1.517690 1.485600 1.490790 μs ITERATING Running cpython.array buffer 0.010000 0.027000 0.091000 0.669000 6.314000 64.389000 635.171000 μs Running cpython.array memoryview 0.013000 0.015000 0.058000 0.354000 3.186000 33.062000 338.300000 μs Running cpython.array raw C type 0.014000 0.146000 0.979000 9.501000 94.160000 916.073000 9287.079000 μs Running numpy.empty_like memoryview 0.042000 0.020000 0.057000 0.352000 3.193000 34.474000 333.089000 μs Running malloc 0.002000 0.004000 0.064000 0.367000 3.599000 32.712000 323.858000 μs Running malloc memoryview 0.019000 0.032000 0.070000 0.356000 3.194000 32.100000 327.929000 μs Running cvarray memoryview 0.014000 0.026000 0.063000 0.351000 3.209000 32.013000 327.890000 μs </code></pre> (The reason for the "iterations" benchmark is that some methods have surprisingly different characteristics in this respect.) In order of initialisation speed: <code>malloc</code>: This is a harsh world, but it's fast. If you need to to allocate a lot of things and have unhindered iteration and indexing performance, this has to be it. But normally you're a good bet for... <code>cpython.array raw C type</code>: Well damn, it's fast. And it's safe. Unfortunately it goes through Python to access its data fields. You can avoid that by using a wonderful trick: <pre class="prettyprint"><code>arr.data.as_doubles[i] </code></pre> which brings it up to the standard speed while removing safety! This makes this a wonderful replacement for <code>malloc</code>, being basically a pretty reference-counted version! <code>cpython.array buffer</code>: Coming in at only three to four times the setup time of <code>malloc</code>, this is looks a wonderful bet. Unfortunately it has significant overhead (albeit small compared to the <code>boundscheck</code> and <code>wraparound</code> directives). That means it only really competes against full-safety variants, but it is the fastest of those to initialise. Your choice. <code>cpython.array memoryview</code>: This is now an order of magnitude slower than <code>malloc</code> to initialise. That's a shame, but it iterates just as fast. This is the standard solution that I would suggest unless <code>boundscheck</code> or <code>wraparound</code> are on (in which case <code>cpython.array buffer</code> might be a more compelling tradeoff). The rest. The only one worth anything is <code>numpy</code>'s, due to the many fun methods attached to the objects. That's it, though.

What is the recommended way of allocating memory for a typed memory view?

Tags:

python

memory-management

memory

buffer

cython

The Cython documentation on typed memory views list three ways of assigning to a typed memory view:

from a raw C pointer,
from a np.ndarray and
from a cython.view.array.

Assume that I don't have data passed in to my cython function from outside but instead want to allocate memory and return it as a np.ndarray, which of those options do I chose? Also assume that the size of that buffer is not a compile-time constant i.e. I can't allocate on the stack, but would need to malloc for option 1.

The 3 options would therefore looke something like this:

from libc.stdlib cimport malloc, free cimport numpy as np from cython cimport view  np.import_array()  def memview_malloc(int N):     cdef int * m = <int *>malloc(N * sizeof(int))     cdef int[::1] b = <int[:N]>m     free(<void *>m)  def memview_ndarray(int N):     cdef int[::1] b = np.empty(N, dtype=np.int32)  def memview_cyarray(int N):     cdef int[::1] b = view.array(shape=(N,), itemsize=sizeof(int), format="i")

What is surprising to me is that in all three cases, Cython generates quite a lot of code for the memory allocation, in particular a call to __Pyx_PyObject_to_MemoryviewSlice_dc_int. This suggests (and I might be wrong here, my insight into the inner workings of Cython are very limited) that it first creates a Python object and then "casts" it into a memory view, which seems unnecessary overhead.

A simple benchmark doesn't reveal much difference between the three methods, with 2. being the fastest by a thin margin.

Which of the three methods is recommended? Or is there a different, better option?

Follow-up question: I want to finally return the result as a np.ndarray, after having worked with that memory view in the function. Is a typed memory view the best choice or would I rather just use the old buffer interface as below to create an ndarray in the first place?

cdef np.ndarray[DTYPE_t, ndim=1] b = np.empty(N, dtype=np.int32)

976

asked Aug 27 '13 10:08

kynan

1 Answers

Look here for an answer.

The basic idea is that you want cpython.array.array and cpython.array.clone (not cython.array.*):

from cpython.array cimport array, clone  # This type is what you want and can be cast to things of # the "double[:]" syntax, so no problems there cdef array[double] armv, templatemv  templatemv = array('d')  # This is fast armv = clone(templatemv, L, False)

EDIT

It turns out that the benchmarks in that thread were rubbish. Here's my set, with my timings:

# cython: language_level=3 # cython: boundscheck=False # cython: wraparound=False  import time import sys  from cpython.array cimport array, clone from cython.view cimport array as cvarray from libc.stdlib cimport malloc, free import numpy as numpy cimport numpy as numpy  cdef int loops  def timefunc(name):     def timedecorator(f):         cdef int L, i          print("Running", name)         for L in [1, 10, 100, 1000, 10000, 100000, 1000000]:             start = time.clock()             f(L)             end = time.clock()             print(format((end-start) / loops * 1e6, "2f"), end=" ")             sys.stdout.flush()          print("μs")     return timedecorator  print() print("INITIALISATIONS") loops = 100000  @timefunc("cpython.array buffer") def _(int L):     cdef int i     cdef array[double] arr, template = array('d')      for i in range(loops):         arr = clone(template, L, False)      # Prevents dead code elimination     str(arr[0])  @timefunc("cpython.array memoryview") def _(int L):     cdef int i     cdef double[::1] arr     cdef array template = array('d')      for i in range(loops):         arr = clone(template, L, False)      # Prevents dead code elimination     str(arr[0])  @timefunc("cpython.array raw C type") def _(int L):     cdef int i     cdef array arr, template = array('d')      for i in range(loops):         arr = clone(template, L, False)      # Prevents dead code elimination     str(arr[0])  @timefunc("numpy.empty_like memoryview") def _(int L):     cdef int i     cdef double[::1] arr     template = numpy.empty((L,), dtype='double')      for i in range(loops):         arr = numpy.empty_like(template)      # Prevents dead code elimination     str(arr[0])  @timefunc("malloc") def _(int L):     cdef int i     cdef double* arrptr      for i in range(loops):         arrptr = <double*> malloc(sizeof(double) * L)         free(arrptr)      # Prevents dead code elimination     str(arrptr[0])  @timefunc("malloc memoryview") def _(int L):     cdef int i     cdef double* arrptr     cdef double[::1] arr      for i in range(loops):         arrptr = <double*> malloc(sizeof(double) * L)         arr = <double[:L]>arrptr         free(arrptr)      # Prevents dead code elimination     str(arr[0])  @timefunc("cvarray memoryview") def _(int L):     cdef int i     cdef double[::1] arr      for i in range(loops):         arr = cvarray((L,),sizeof(double),'d')      # Prevents dead code elimination     str(arr[0])    print() print("ITERATING") loops = 1000  @timefunc("cpython.array buffer") def _(int L):     cdef int i     cdef array[double] arr = clone(array('d'), L, False)      cdef double d     for i in range(loops):         for i in range(L):             d = arr[i]      # Prevents dead-code elimination     str(d)  @timefunc("cpython.array memoryview") def _(int L):     cdef int i     cdef double[::1] arr = clone(array('d'), L, False)      cdef double d     for i in range(loops):         for i in range(L):             d = arr[i]      # Prevents dead-code elimination     str(d)  @timefunc("cpython.array raw C type") def _(int L):     cdef int i     cdef array arr = clone(array('d'), L, False)      cdef double d     for i in range(loops):         for i in range(L):             d = arr[i]      # Prevents dead-code elimination     str(d)  @timefunc("numpy.empty_like memoryview") def _(int L):     cdef int i     cdef double[::1] arr = numpy.empty((L,), dtype='double')      cdef double d     for i in range(loops):         for i in range(L):             d = arr[i]      # Prevents dead-code elimination     str(d)  @timefunc("malloc") def _(int L):     cdef int i     cdef double* arrptr = <double*> malloc(sizeof(double) * L)      cdef double d     for i in range(loops):         for i in range(L):             d = arrptr[i]      free(arrptr)      # Prevents dead-code elimination     str(d)  @timefunc("malloc memoryview") def _(int L):     cdef int i     cdef double* arrptr = <double*> malloc(sizeof(double) * L)     cdef double[::1] arr = <double[:L]>arrptr      cdef double d     for i in range(loops):         for i in range(L):             d = arr[i]      free(arrptr)      # Prevents dead-code elimination     str(d)  @timefunc("cvarray memoryview") def _(int L):     cdef int i     cdef double[::1] arr = cvarray((L,),sizeof(double),'d')      cdef double d     for i in range(loops):         for i in range(L):             d = arr[i]      # Prevents dead-code elimination     str(d)

Output:

INITIALISATIONS Running cpython.array buffer 0.100040 0.097140 0.133110 0.121820 0.131630 0.108420 0.112160 μs Running cpython.array memoryview 0.339480 0.333240 0.378790 0.445720 0.449800 0.414280 0.414060 μs Running cpython.array raw C type 0.048270 0.049250 0.069770 0.074140 0.076300 0.060980 0.060270 μs Running numpy.empty_like memoryview 1.006200 1.012160 1.128540 1.212350 1.250270 1.235710 1.241050 μs Running malloc 0.021850 0.022430 0.037240 0.046260 0.039570 0.043690 0.030720 μs Running malloc memoryview 1.640200 1.648000 1.681310 1.769610 1.755540 1.804950 1.758150 μs Running cvarray memoryview 1.332330 1.353910 1.358160 1.481150 1.517690 1.485600 1.490790 μs  ITERATING Running cpython.array buffer 0.010000 0.027000 0.091000 0.669000 6.314000 64.389000 635.171000 μs Running cpython.array memoryview 0.013000 0.015000 0.058000 0.354000 3.186000 33.062000 338.300000 μs Running cpython.array raw C type 0.014000 0.146000 0.979000 9.501000 94.160000 916.073000 9287.079000 μs Running numpy.empty_like memoryview 0.042000 0.020000 0.057000 0.352000 3.193000 34.474000 333.089000 μs Running malloc 0.002000 0.004000 0.064000 0.367000 3.599000 32.712000 323.858000 μs Running malloc memoryview 0.019000 0.032000 0.070000 0.356000 3.194000 32.100000 327.929000 μs Running cvarray memoryview 0.014000 0.026000 0.063000 0.351000 3.209000 32.013000 327.890000 μs

(The reason for the "iterations" benchmark is that some methods have surprisingly different characteristics in this respect.)

In order of initialisation speed:

malloc: This is a harsh world, but it's fast. If you need to to allocate a lot of things and have unhindered iteration and indexing performance, this has to be it. But normally you're a good bet for...

cpython.array raw C type: Well damn, it's fast. And it's safe. Unfortunately it goes through Python to access its data fields. You can avoid that by using a wonderful trick:

arr.data.as_doubles[i]

which brings it up to the standard speed while removing safety! This makes this a wonderful replacement for malloc, being basically a pretty reference-counted version!

cpython.array buffer: Coming in at only three to four times the setup time of malloc, this is looks a wonderful bet. Unfortunately it has significant overhead (albeit small compared to the boundscheck and wraparound directives). That means it only really competes against full-safety variants, but it is the fastest of those to initialise. Your choice.

cpython.array memoryview: This is now an order of magnitude slower than malloc to initialise. That's a shame, but it iterates just as fast. This is the standard solution that I would suggest unless boundscheck or wraparound are on (in which case cpython.array buffer might be a more compelling tradeoff).

The rest. The only one worth anything is numpy's, due to the many fun methods attached to the objects. That's it, though.

128

answered Sep 22 '22 01:09

Veedrac

Related questions
                            
                                Is there a native templating system for plain text files in Python?
                            
                                standard_init_linux.go:211: exec user process caused "exec format error"
                            
                                Error loading MySQLdb Module 'Did you install mysqlclient or MySQL-python?'
                            
                                Loop "Forgets" to Remove Some Items [duplicate]
                            
                                Python: get datetime for '3 years ago today'
                            
                                Django 1.7 - "No migrations to apply" when run migrate after makemigrations
                            
                                Python in-memory cache with time to live
                            
                                Heroku truncates HTTP responses?
                            
                                How can I release memory after creating matplotlib figures
                            
                                Which classes cannot be subclassed?
                            
                                How to make pip "dry-run"?
                            
                                Practical examples of NLTK use [closed]
                            
                                SQL Alchemy ORM returning a single column, how to avoid common post processing
                            
                                Process to convert simple Python script into Windows executable [duplicate]
                            
                                TypeError: list indices must be integers or slices, not str
                            
                                yield in list comprehensions and generator expressions
                            
                                Writing a parser for regular expressions
                            
                                ropemacs USAGE tutorial
                            
                                Shebang Notation: Python Scripts on Windows and Linux?
                            
                                MongoKit vs MongoEngine vs Flask-MongoAlchemy for Flask [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With