In a bizarre turn of events, I've ended up in the following predicament where I'm using the following Python code to write the assembly generated by Numba to a file:
@jit(nopython=True, nogil=True)
def six():
return 6
with open("six.asm", "w") as f:
for k, v in six.inspect_asm().items():
f.write(v)
The assembly code is successfully written to the file but I can't figure out how to execute it. I've tried the following:
$ as -o six.o six.asm
$ ld six.o -o six.bin
$ chmod +x six.bin
$ ./six.bin
However, the linking step fails with the following:
ld: warning: cannot find entry symbol _start; defaulting to 00000000004000f0
six.o: In function `cpython::__main__::six$241':
<string>:(.text+0x20): undefined reference to `PyArg_UnpackTuple'
<string>:(.text+0x47): undefined reference to `PyEval_SaveThread'
<string>:(.text+0x53): undefined reference to `PyEval_RestoreThread'
<string>:(.text+0x62): undefined reference to `PyLong_FromLongLong'
<string>:(.text+0x74): undefined reference to `PyExc_RuntimeError'
<string>:(.text+0x88): undefined reference to `PyErr_SetString'
I'm suspecting that the Numba and/or the Python standard library need to be dynamically linked against the generated object file for this to run successfully but I'm not sure how it can be done (if it can even be done in the first place).
I've also tried the following where I write the intermediate LLVM code to the file instead of the assembly:
with open("six.ll", "w") as f:
for k, v in six.inspect_llvm().items():
f.write(v)
And then
$ lli six.ll
But this fails as well with the following error:
'main' function not found in module.
UPDATE:
It turns out that there exists a utility to find the relevant flags to pass to the ld
command to dynamically link the Python standard library.
$ python3-config --ldflags
Returns
-L/Users/rayan/anaconda3/lib/python3.7/config-3.7m-darwin -lpython3.7m -ldl -framework CoreFoundation
Running the following again, this time with the correct flags:
$ as -o six.o six.asm
$ ld six.o -o six.bin -L/Users/rayan/anaconda3/lib/python3.7/config-3.7m-darwin -lpython3.7m -ldl -framework CoreFoundation
$ chmod +x six.bin
$ ./six.bin
I am now getting
ld: warning: No version-min specified on command line
ld: entry point (_main) undefined. for inferred architecture x86_64
I have tried adding a _main
label in the assembly file but that doesn't seem to do anything. Any ideas on how to define the entry point?
UPDATE 2:
Here's the assembly code in case that's useful, it seems like the target function is the one with label _ZN8__main__7six$241E
:
.text
.file "<string>"
.globl _ZN8__main__7six$241E
.p2align 4, 0x90
.type _ZN8__main__7six$241E,@function
_ZN8__main__7six$241E:
movq $6, (%rdi)
xorl %eax, %eax
retq
.Lfunc_end0:
.size _ZN8__main__7six$241E, .Lfunc_end0-_ZN8__main__7six$241E
.globl _ZN7cpython8__main__7six$241E
.p2align 4, 0x90
.type _ZN7cpython8__main__7six$241E,@function
_ZN7cpython8__main__7six$241E:
.cfi_startproc
pushq %rax
.cfi_def_cfa_offset 16
movq %rsi, %rdi
movabsq $.const.six, %rsi
movabsq $PyArg_UnpackTuple, %r8
xorl %edx, %edx
xorl %ecx, %ecx
xorl %eax, %eax
callq *%r8
testl %eax, %eax
je .LBB1_3
movabsq $_ZN08NumbaEnv8__main__7six$241E, %rax
cmpq $0, (%rax)
je .LBB1_2
movabsq $PyEval_SaveThread, %rax
callq *%rax
movabsq $PyEval_RestoreThread, %rcx
movq %rax, %rdi
callq *%rcx
movabsq $PyLong_FromLongLong, %rax
movl $6, %edi
popq %rcx
.cfi_def_cfa_offset 8
jmpq *%rax
.LBB1_2:
.cfi_def_cfa_offset 16
movabsq $PyExc_RuntimeError, %rdi
movabsq $".const.missing Environment", %rsi
movabsq $PyErr_SetString, %rax
callq *%rax
.LBB1_3:
xorl %eax, %eax
popq %rcx
.cfi_def_cfa_offset 8
retq
.Lfunc_end1:
.size _ZN7cpython8__main__7six$241E, .Lfunc_end1-_ZN7cpython8__main__7six$241E
.cfi_endproc
.globl cfunc._ZN8__main__7six$241E
.p2align 4, 0x90
.type cfunc._ZN8__main__7six$241E,@function
cfunc._ZN8__main__7six$241E:
movl $6, %eax
retq
.Lfunc_end2:
.size cfunc._ZN8__main__7six$241E, .Lfunc_end2-cfunc._ZN8__main__7six$241E
.type _ZN08NumbaEnv8__main__7six$241E,@object
.comm _ZN08NumbaEnv8__main__7six$241E,8,8
.type .const.six,@object
.section .rodata,"a",@progbits
.const.six:
.asciz "six"
.size .const.six, 4
.type ".const.missing Environment",@object
.p2align 4
.const.missing Environment:
.asciz "missing Environment"
.size ".const.missing Environment", 20
.section ".note.GNU-stack","",@progbits
Introducing Numba You don't need to do anything fancy with your Python code either. Just add a single line before the Python function you want to optimise and Numba will do the rest! If your code has a lot of numerical operations, uses Numpy a lot, and/or has a lot of loops, then Numba should give you a good speedup.
Numba reads the Python bytecode for a decorated function and combines this with information about the types of the input arguments to the function. It analyzes and optimizes your code, and finally uses the LLVM compiler library to generate a machine code version of your function, tailored to your CPU capabilities.
Another common reason for Numba not being able to compile your code is that it cannot statically determine the return type of a function. The most likely cause of this is the return type depending on a value that is available only at runtime. Again, this is most often problematic when using nopython mode.
Numba is generally faster than Numpy and even Cython (at least on Linux). In this benchmark, pairwise distances have been computed, so this may depend on the algorithm.
After browsing [PyData.Numba]: Numba docs, and some debugging, trial and error, I reached to a conclusion: it seems you're off the path to your quest (as was also pointed out in comments).
Numba converts Python code (functions) to machine code (for the obvious reason: speed). It does everything (convert, build, insert in the running process) on the fly, the programmer only needs to decorate the function as e.g. @numba.jit
([PyData.Numba]: Just-in-Time compilation).
The behavior that you're experiencing is correct. The Dispatcher object (used by decorating the six function) only generates (assembly) code for the function itself (it's no main there, as the code is executing in the current process (Python interpreter's main function)). So, it's normal for the linker to complain there's no main symbol. It's like writing a C file that only contains:
int six() {
return 6;
}
In order for things to work properly, you have to:
Build the .asm file into an .o (object) file (done)
Include the .o file from #1. into a library which can be
The library is to be linked in the (final) executable. This step is optional as you could use the .o file directly
Build another file that defines main (and calls six - which I assume it's the whole purpose) into an .o file. As I'm not very comfortable with assembly, I wrote it in C
Link the 2 entities (from #2. (#1.) and #3.) together
As an alternative, you could take a look at [PyData.Numba]: Compiling code ahead of time, but bear in mind that it would generate a Python (extension) module.
Back to the current problem. Did the test on Ubuntu 18.04 64bit.
code00.py:
#!/usr/bin/env python
import sys
import math
import numba
@numba.jit(nopython=True, nogil=True)
def six():
return 6
def main(*argv):
six() # Call the function(s), otherwise `inspect_asm()` would return empty dict
speed_funcs = [
(six, numba.int32()),
]
for func, _ in speed_funcs:
file_name_asm = "numba_{0:s}_{1:s}_{2:03d}_{3:02d}{4:02d}{5:02d}.asm".format(func.__name__, sys.platform, int(round(math.log2(sys.maxsize))) + 1, *sys.version_info[:3])
asm = func.inspect_asm()
print("Writing to {0:s}:".format(file_name_asm))
with open(file_name_asm, "wb") as fout:
for k, v in asm.items():
print(" {0:}".format(k))
fout.write(v.encode())
if __name__ == "__main__":
print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
main(*sys.argv[1:])
print("\nDone.")
main00.c:
#include <stdio.h>
#include <dlfcn.h>
//#define SYMBOL_SIX "_ZN8__main__7six$241E"
#define SYMBOL_SIX "cfunc._ZN8__main__7six$241E"
typedef int (*SixFuncPtr)();
int main() {
void *pMod = dlopen("./libnumba_six_linux.so", RTLD_LAZY);
if (!pMod) {
printf("Error (%s) loading module\n", dlerror());
return -1;
}
SixFuncPtr pSixFunc = dlsym(pMod, SYMBOL_SIX);
if (!pSixFunc)
{
printf("Error (%s) loading function\n", dlerror());
dlclose(pMod);
return -2;
}
printf("six() returned: %d\n", (*pSixFunc)());
dlclose(pMod);
return 0;
}
build.sh:
CC=gcc
LIB_BASE_NAME=numba_six_linux
FLAG_LD_LIB_NUMBALINUX="-Wl,-L. -Wl,-l${LIB_BASE_NAME}"
FLAG_LD_LIB_PYTHON="-Wl,-L/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu -Wl,-lpython3.7m"
rm -f *.asm *.o *.a *.so *.exe
echo Generate .asm
python3 code00.py
echo Assemble
as -o ${LIB_BASE_NAME}.o ${LIB_BASE_NAME}_064_030705.asm
echo Link library
LIB_NUMBA="./lib${LIB_BASE_NAME}.so"
#ar -scr ${LIB_NUMBA} ${LIB_BASE_NAME}.o
${CC} -o ${LIB_NUMBA} -shared ${LIB_BASE_NAME}.o ${FLAG_LD_LIB_PYTHON}
echo Dump library contents
nm -S ${LIB_NUMBA}
#objdump -t ${LIB_NUMBA}
echo Compile and link executable
${CC} -o main00.exe main00.c -ldl
echo Exit script
Output:
(py_venv_pc064_03.07.05_test0) [cfati@cfati-ubtu-18-064-00:~/Work/Dev/StackOverflow/q061678226]> ~/sopr.sh *** Set shorter prompt to better fit when pasted in StackOverflow (or other) pages *** [064bit prompt]> [064bit prompt]> ls build.sh code00.py main00.c [064bit prompt]> [064bit prompt]> ./build.sh Generate .asm Python 3.7.5 (default, Nov 7 2019, 10:50:52) [GCC 8.3.0] 64bit on linux Writing to numba_six_linux_064_030705.asm: () Done. Assemble Link library Dump library contents 0000000000201020 B __bss_start 00000000000008b0 0000000000000006 T cfunc._ZN8__main__7six$241E 0000000000201020 0000000000000001 b completed.7698 00000000000008e0 0000000000000014 r .const.missing Environment 00000000000008d0 0000000000000004 r .const.six w __cxa_finalize 0000000000000730 t deregister_tm_clones 00000000000007c0 t __do_global_dtors_aux 0000000000200e58 t __do_global_dtors_aux_fini_array_entry 0000000000201018 d __dso_handle 0000000000200e60 d _DYNAMIC 0000000000201020 D _edata 0000000000201030 B _end 00000000000008b8 T _fini 0000000000000800 t frame_dummy 0000000000200e50 t __frame_dummy_init_array_entry 0000000000000990 r __FRAME_END__ 0000000000201000 d _GLOBAL_OFFSET_TABLE_ w __gmon_start__ 00000000000008f4 r __GNU_EH_FRAME_HDR 00000000000006f0 T _init w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable U PyArg_UnpackTuple U PyErr_SetString U PyEval_RestoreThread U PyEval_SaveThread U PyExc_RuntimeError U PyLong_FromLongLong 0000000000000770 t register_tm_clones 0000000000201020 d __TMC_END__ 0000000000201028 0000000000000008 B _ZN08NumbaEnv8__main__7six$241E 0000000000000820 0000000000000086 T _ZN7cpython8__main__7six$241E 0000000000000810 000000000000000a T _ZN8__main__7six$241E Compile and link executable Exit script [064bit prompt]> [064bit prompt]> ls build.sh code00.py libnumba_six_linux.so main00.c main00.exe numba_six_linux_064_030705.asm numba_six_linux.o [064bit prompt]> [064bit prompt]> # Run the executable [064bit prompt]> [064bit prompt]> ./main00.exe six() returned: 6 [064bit prompt]>
Also posting (since it's important) numba_six_linux_064_030705.asm:
.text
.file "<string>"
.globl _ZN8__main__7six$241E
.p2align 4, 0x90
.type _ZN8__main__7six$241E,@function
_ZN8__main__7six$241E:
movq $6, (%rdi)
xorl %eax, %eax
retq
.Lfunc_end0:
.size _ZN8__main__7six$241E, .Lfunc_end0-_ZN8__main__7six$241E
.globl _ZN7cpython8__main__7six$241E
.p2align 4, 0x90
.type _ZN7cpython8__main__7six$241E,@function
_ZN7cpython8__main__7six$241E:
.cfi_startproc
pushq %rax
.cfi_def_cfa_offset 16
movq %rsi, %rdi
movabsq $.const.six, %rsi
movabsq $PyArg_UnpackTuple, %r8
xorl %edx, %edx
xorl %ecx, %ecx
xorl %eax, %eax
callq *%r8
testl %eax, %eax
je .LBB1_3
movabsq $_ZN08NumbaEnv8__main__7six$241E, %rax
cmpq $0, (%rax)
je .LBB1_2
movabsq $PyEval_SaveThread, %rax
callq *%rax
movabsq $PyEval_RestoreThread, %rcx
movq %rax, %rdi
callq *%rcx
movabsq $PyLong_FromLongLong, %rax
movl $6, %edi
popq %rcx
.cfi_def_cfa_offset 8
jmpq *%rax
.LBB1_2:
.cfi_def_cfa_offset 16
movabsq $PyExc_RuntimeError, %rdi
movabsq $".const.missing Environment", %rsi
movabsq $PyErr_SetString, %rax
callq *%rax
.LBB1_3:
xorl %eax, %eax
popq %rcx
.cfi_def_cfa_offset 8
retq
.Lfunc_end1:
.size _ZN7cpython8__main__7six$241E, .Lfunc_end1-_ZN7cpython8__main__7six$241E
.cfi_endproc
.globl cfunc._ZN8__main__7six$241E
.p2align 4, 0x90
.type cfunc._ZN8__main__7six$241E,@function
cfunc._ZN8__main__7six$241E:
movl $6, %eax
retq
.Lfunc_end2:
.size cfunc._ZN8__main__7six$241E, .Lfunc_end2-cfunc._ZN8__main__7six$241E
.type _ZN08NumbaEnv8__main__7six$241E,@object
.comm _ZN08NumbaEnv8__main__7six$241E,8,8
.type .const.six,@object
.section .rodata,"a",@progbits
.const.six:
.asciz "six"
.size .const.six, 4
.type ".const.missing Environment",@object
.p2align 4
".const.missing Environment":
.asciz "missing Environment"
.size ".const.missing Environment", 20
.section ".note.GNU-stack","",@progbits
Notes:
numba_six_linux_064_030705.asm (and everything that derives from it) contain the code for the six function. Actually, there are a bunch of symbols (on OSX, you can also use the native otool -T
) like:
cfunc._ZN8__main__7six$241E - the (C) function itself
_ZN7cpython8__main__7six$241E - the Python wrapper:
nopython=True
has no effect in this caseAlso, the main part from these symbols doesn't refer to an executable entry point (main function), but to a Python module's top level namespace (__main__). After all, this code is supposed to be run from Python
Due to the fact that the C plain function contains a dot (.) in the name, I couldn't call it directly from C (as it's an invalid identifier name), so I had to load (the .so and) the function manually (dlopen / dlsym), resulting in more code than simply calling the function.
I didn't try it, but I think it would make sense that the following (manual) changes to the generated .asm file would simplify the work:
Thanks to @PeterCordes, who shared that exact piece of info ([GNU.GCC]: Controlling Names Used in Assembler Code) that I was missing, here's a much simpler version.
main01.c:
#include <stdio.h>
extern int six() asm ("cfunc._ZN8__main__7six$241E");
int main() {
printf("six() returned: %d\n", six());
}
Output:
[064bit prompt]> # Resume from previous point + main01.c [064bit prompt]> [064bit prompt]> ls build.sh code00.py libnumba_six_linux.so main00.c main00.exe main01.c numba_six_linux_064_030705.asm numba_six_linux.o [064bit prompt]> [064bit prompt]> ar -scr libnumba_six_linux.a numba_six_linux.o [064bit prompt]> [064bit prompt]> gcc -o main01.exe main01.c ./libnumba_six_linux.a -Wl,-L/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu -Wl,-lpython3.7m [064bit prompt]> [064bit prompt]> ls build.sh code00.py libnumba_six_linux.a libnumba_six_linux.so main00.c main00.exe main01.c main01.exe numba_six_linux_064_030705.asm numba_six_linux.o [064bit prompt]> [064bit prompt]> ./main01.exe six() returned: 6 [064bit prompt]>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With