The bottleneck of my code is currently a conversion from a Python list to a C array using ctypes, as described in this question.
A small experiment shows that it is indeed very slow, in comparison of other Python instructions:
import timeit
setup="from array import array; import ctypes; t = [i for i in range(1000000)];"
print(timeit.timeit(stmt='(ctypes.c_uint32 * len(t))(*t)',setup=setup,number=10))
print(timeit.timeit(stmt='array("I",t)',setup=setup,number=10))
print(timeit.timeit(stmt='set(t)',setup=setup,number=10))
Gives:
1.790962941000089
0.0911122129996329
0.3200237319997541
I obtained these results with CPython 3.4.2. I get similar times with CPython 2.7.9 and Pypy 2.4.0.
I tried runing the above code with perf
, commenting the timeit
instructions to run only one at a time. I get these results:
ctypes
Performance counter stats for 'python3 perf.py':
1807,891637 task-clock (msec) # 1,000 CPUs utilized
8 context-switches # 0,004 K/sec
0 cpu-migrations # 0,000 K/sec
59 523 page-faults # 0,033 M/sec
5 755 704 178 cycles # 3,184 GHz
13 552 506 138 instructions # 2,35 insn per cycle
3 217 289 822 branches # 1779,581 M/sec
748 614 branch-misses # 0,02% of all branches
1,808349671 seconds time elapsed
array
Performance counter stats for 'python3 perf.py':
144,678718 task-clock (msec) # 0,998 CPUs utilized
0 context-switches # 0,000 K/sec
0 cpu-migrations # 0,000 K/sec
12 913 page-faults # 0,089 M/sec
458 284 661 cycles # 3,168 GHz
1 253 747 066 instructions # 2,74 insn per cycle
325 528 639 branches # 2250,011 M/sec
708 280 branch-misses # 0,22% of all branches
0,144966969 seconds time elapsed
set
Performance counter stats for 'python3 perf.py':
369,786395 task-clock (msec) # 0,999 CPUs utilized
0 context-switches # 0,000 K/sec
0 cpu-migrations # 0,000 K/sec
108 584 page-faults # 0,294 M/sec
1 175 946 161 cycles # 3,180 GHz
2 086 554 968 instructions # 1,77 insn per cycle
422 531 402 branches # 1142,636 M/sec
768 338 branch-misses # 0,18% of all branches
0,370103043 seconds time elapsed
The code with ctypes
has less page-faults than the code with set
and the same number of branch-misses than the two others. The only thing I see is that there are more instructions and branches (but I still don't know why) and more context switches (but it is certainly a consequence of the longer run time rather than a cause).
I therefore have two questions:
ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.
Numpy contains some support for interfacing with ctypes. In particular there is support for exporting certain attributes of a Numpy array as ctypes data-types and there are functions to convert from C arrays to Numpy arrays and back.
The built-in ctypes module is a powerful feature in Python, allowing you to use existing libraries in other languages by writting simple wrappers in Python itself. Unfortunately it can be a bit tricky to use. In this article we'll explore some of the basics of ctypes .
The solution is to use the array
module and cast the address or use the from_buffer method...
import timeit
setup="from array import array; import ctypes; t = [i for i in range(1000000)];"
print(timeit.timeit(stmt="v = array('I',t);assert v.itemsize == 4; addr, count = v.buffer_info();p = ctypes.cast(addr,ctypes.POINTER(ctypes.c_uint32))",setup=setup,number=10))
print(timeit.timeit(stmt="v = array('I',t);a = (ctypes.c_uint32 * len(v)).from_buffer(v)",setup=setup,number=10))
print(timeit.timeit(stmt='(ctypes.c_uint32 * len(t))(*t)',setup=setup,number=10))
print(timeit.timeit(stmt='set(t)',setup=setup,number=10))
It is then many times faster when using Python 3:
$ python3 convert.py
0.08303386811167002
0.08139665238559246
1.5630637975409627
0.3013848252594471
While this is not a definitive answer, the problem seems to be the constructor call with *t
. Doing the following instead, decreases the overhead significantly:
array = (ctypes.c_uint32 * len(t))()
array[:] = t
Test:
import timeit
setup="from array import array; import ctypes; t = [i for i in range(1000000)];"
print(timeit.timeit(stmt='(ctypes.c_uint32 * len(t))(*t)',setup=setup,number=10))
print(timeit.timeit(stmt='a = (ctypes.c_uint32 * len(t))(); a[:] = t',setup=setup,number=10))
print(timeit.timeit(stmt='array("I",t)',setup=setup,number=10))
print(timeit.timeit(stmt='set(t)',setup=setup,number=10))
Output:
1.7090932869978133
0.3084979929990368
0.08278547400186653
0.2775516299989249
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With