Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Self-built extension module slower than built-in c module

To learn how to create C-extensions I've decided to just copy a built-in .c-file (in this case itertoolsmodule.c) and placed it in my package. I only changed the names inside the module from itertools to mypkg.

Then I compiled it (Windows 10, MSVC Community 14) as setuptools.Extension:

from setuptools import setup, Extension

itertools_module = Extension('mypkg.itertoolscopy',
                              sources=['src/itertoolsmodulecopy.c'])

setup(...
      ext_modules=[itertools_module])

The default uses the compiler flags /c /nologo /Ox /W3 /GL /DNDEBUG /MD and I read somewhere that these defaults equals the settings of how the python was compiled. However I use conda (64bit setup) so this might not necessarily be true.

It all went well - but a benchmark for filterfalse showed that it's almost a factor 2 slower than the built-in:

import mypkg
import itertools

import random

a = [random.random() for _ in range(500000)]
func = None

%timeit list(filter(func, a))
100 loops, best of 3: 3.42 ms per loop
%timeit list(itertools.filterfalse(func, a))
100 loops, best of 3: 3.41 ms per loop
%timeit list(mypkg.filterfalse(func, a))
100 loops, best of 3: 6.77 ms per loop

However, for smaller iterables the discrepancy also becomes smaller:

a = [random.random() for _ in range(500)]  # 1 / 1000 of the elements

%timeit list(filter(func, a))
100000 loops, best of 3: 9.66 µs per loop
%timeit list(itertools.filterfalse(func, a))
100000 loops, best of 3: 10.8 µs per loop
%timeit list(mypkg.filterfalse(func, a))
100000 loops, best of 3: 14.4 µs per loop

I wasn't able to explain this difference in speed but I have to admit that I'm not too familiar with compiling C-code. I'm at a loss what actually makes it slower.

The results are the same on python 2.7 with ifilter and ifilterfalse and the 2.7 version of the itertoolsmodule.c file.

Does anyone knows what makes the code perform worse than the built-ins and how one could speed it up?

like image 422
MSeifert Avatar asked Sep 21 '16 18:09

MSeifert


People also ask

Why extend Python with C?

Extending Python with C or C++ It is quite easy to add new built-in modules to Python, if you know how to program in C. Such extension modules can do two things that can't be done directly in Python: they can implement new built-in object types, and they can call C library functions and system calls.

Can you write a Python library in C?

To write Python modules in C, you'll need to use the Python API, which defines the various functions, macros, and variables that allow the Python interpreter to call your C code. All of these tools and more are collectively bundled in the Python.

What is Python C extension?

Any code that you write using any compiled language like C, C++, or Java can be integrated or imported into another Python script. This code is considered as an "extension." A Python extension module is nothing more than a normal C library. On Unix machines, these libraries usually end in . so (for shared object).

What are extension modules?

¶ A CPython extension module is a module which can be imported and used from within Python which is written in another language. Extension modules are almost always written in C, and sometimes in C++, because CPython provides an API for working with Python objects targeted at C.


1 Answers

Curious about this problem myself I set out to attempt to reproduce the findings. Though the OP is on windows, it was slightly easier for me to attempt this on linux. I did eventually try it on windows but I'll walk you through what I did!

setup

I made a little test harness, it's a shell script but it makes it easier for someone else to try what I'm trying :D

test.sh

#!/usr/bin/env bash
set -euxo pipefail
rm -rf itertoolsmodule.c setup.py venv

PYTHON=3.5
FUNCTION=filterfalse
INIT=PyInit_
#PYTHON=2.7
#FUNCTION=ifilterfalse
#INIT=init

wget "https://raw.githubusercontent.com/python/cpython/$PYTHON/Modules/itertoolsmodule.c"
sed -i "s/${INIT}itertools/${INIT}_myitertools/" itertoolsmodule.c
sed -i 's/"itertools"/"_myitertools"/' itertoolsmodule.c

cat > setup.py << EOF
from setuptools import setup, Extension
mod = Extension('_myitertools', ['itertoolsmodule.c'])
setup(name='foo', ext_modules=[mod])
EOF

virtualenv venv -ppython"$PYTHON"
venv/bin/pip install . -v

cat > test.py << EOF
import _myitertools
import itertools
import random
import time


a = [random.random() for _ in range(500000)]
iterations = range(10)
seconds = 5


def builtins_filter():
    for _ in iterations:
        list(filter(None, a))

_itertools_filterfalse = itertools.$FUNCTION
def itertools_filterfalse():
    for _ in iterations:
        list(_itertools_filterfalse(None, a))

_myitertools_filterfalse = _myitertools.$FUNCTION
def myitertools_filterfalse():
    for _ in iterations:
        list(_myitertools_filterfalse(None, a))


def runbench(func):
    start = time.time()
    end = start + seconds
    iterations = 0
    while time.time() < end:
        func()
        iterations += 1
    return iterations


for func in (builtins_filter, itertools_filterfalse, myitertools_filterfalse):
    print('*' * 79)
    print(func.__name__)
    print('{} iterations in {} seconds'.format(runbench(func), seconds))
EOF

ubuntu16.04 x86_64 python3.5.2 (stock, apt)

(I cut out the (imo) unimportant parts):

$ ./test.sh
+ rm -rf itertoolsmodule.c setup.py venv
+ PYTHON=3.5
+ FUNCTION=filterfalse
+ INIT=PyInit_

...

+ venv/bin/pip install . -v

...

    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/tmp/foo/venv/include/python3.5m -c itertoolsmodule.c -o build/temp.linux-x86_64-3.5/itertoolsmodule.o
    creating build/lib.linux-x86_64-3.5
    x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/itertoolsmodule.o -o build/lib.linux-x86_64-3.5/_myitertools.cpython-35m-x86_64-linux-gnu.so

...

+ venv/bin/python test.py
*******************************************************************************
builtins_filter
1401 iterations in 50 seconds
*******************************************************************************
itertools_filterfalse
1977 iterations in 50 seconds
*******************************************************************************
myitertools_filterfalse
1981 iterations in 50 seconds

ubuntu16.04 x86_64 python2.7.12 (stock, apt)

+ rm -rf itertoolsmodule.c setup.py venv
+ PYTHON=2.7
+ FUNCTION=ifilterfalse
+ INIT=init

...

+ venv/bin/pip install . -v

...

    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c itertoolsmodule.c -o build/temp.linux-x86_64-2.7/itertoolsmodule.o
    creating build/lib.linux-x86_64-2.7
    x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-Bsymbolic-functions -Wl,-z,relro -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/itertoolsmodule.o -o build/lib.linux-x86_64-2.7/_myitertools.so

...

+ venv/bin/python test.py
*******************************************************************************
builtins_filter
871 iterations in 50 seconds
*******************************************************************************
itertools_filterfalse
1918 iterations in 50 seconds
*******************************************************************************
myitertools_filterfalse
1863 iterations in 50 seconds

Windows!

For windows, I changed the script slightly so it built virtualenvs using C:\Python##\python.exe (Using mysysgit so I have some amount of a unix toolset (bash, etc.)). Changing things from bin to Scripts (for virtualenv), etc. I don't have/use conda so these'll just be stock python on windows 10

windows 10 python 2.7.9 (stock, msi installer)

+ rm -rf itertoolsmodule.c setup.py venv
+ PYTHON=2.7
+ FUNCTION=ifilterfalse
+ INIT=init

...

+ venv/Scripts/pip install . -v

...

    C:\Users\Anthony\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\Python27\include -Ic:\users\anthony\appdata\local\temp\foo\venv\PC /Tcitertoolsmodule.c /Fobuild\temp.win32-2.7\Release\itertoolsmodule.obj
itertoolsmodule.c
    creating build\lib.win32-2.7
    C:\Users\Anthony\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\Python27\Libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\PCbuild /EXPORT:init_myitertools build\temp.win32-2.7\Release\itertoolsmodule.obj /OUT:build\lib.win32-2.7\_myitertools.pyd /IMPLIB:build\temp.win32-2.7\Release\_myitertools.lib /MANIFESTFILE:build\temp.win32-2.7\Release\_myitertools.pyd.manifest

...

+ venv/Scripts/python test.py
*******************************************************************************
builtins_filter
914 iterations in 50 seconds
*******************************************************************************
itertools_filterfalse
2352 iterations in 50 seconds
*******************************************************************************
myitertools_filterfalse
2266 iterations in 50 seconds

windows 10 python3.5.1 (stock, msi installer)

+ rm -rf itertoolsmodule.c setup.py venv
+ PYTHON=3.5
+ FUNCTION=filterfalse
+ INIT=PyInit_

...

+ venv/Scripts/pip install . -v

...

    D:\Programs\VS2015\VC\BIN\amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Python35\include -IC:\Python35\include -ID:\Programs\VS2015\VC\INCLUDE -ID:\Programs\VS2015\VC\ATLMFC\INCLUDE "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\winrt" /Tcitertoolsmodule.c /Fobuild\temp.win-amd64-3.5\Release\itertoolsmodule.obj
itertoolsmodule.c
    creating C:\Temp\pip-1fnf27jo-build\build\lib.win-amd64-3.5
    D:\Programs\VS2015\VC\BIN\amd64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Python35\Libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\PCbuild\amd64 /LIBPATH:D:\Programs\VS2015\VC\LIB\amd64 /LIBPATH:D:\Programs\VS2015\VC\ATLMFC\LIB\amd64 "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64" /EXPORT:PyInit__myitertools build\temp.win-amd64-3.5\Release\itertoolsmodule.obj /OUT:build\lib.win-amd64-3.5\_myitertools.cp35-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.5\Release\_myitertools.cp35-win_amd64.lib

...

+ venv/Scripts/python test.py
*******************************************************************************
builtins_filter
658 iterations in 50 seconds
*******************************************************************************
itertools_filterfalse
2601 iterations in 50 seconds
*******************************************************************************
myitertools_filterfalse
2715 iterations in 50 seconds

Conclusion

At the very least, my tests with stock python show that the extension module does not exhibit different performance characteristics.

wellp, I spent a half hour on this and didn't produce a reproduction. Hopefully this is helpful for the next poor soul who attempts this. I can only guess that conda is doing some additional optimization and then shipping a pyconfig.h file which lies about the flags used to compile. Though to be honest, I haven't yet ventured into the conda space so I don't know how their ecosystem works

like image 158
Anthony Sottile Avatar answered Sep 28 '22 11:09

Anthony Sottile