Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Executing the assembly generated by Numba

In a bizarre turn of events, I've ended up in the following predicament where I'm using the following Python code to write the assembly generated by Numba to a file:

@jit(nopython=True, nogil=True)
def six():
    return 6

with open("six.asm", "w") as f:
    for k, v in six.inspect_asm().items():
        f.write(v)

The assembly code is successfully written to the file but I can't figure out how to execute it. I've tried the following:

$ as -o six.o six.asm
$ ld six.o -o six.bin
$ chmod +x six.bin
$ ./six.bin

However, the linking step fails with the following:

ld: warning: cannot find entry symbol _start; defaulting to 00000000004000f0
six.o: In function `cpython::__main__::six$241':
<string>:(.text+0x20): undefined reference to `PyArg_UnpackTuple'
<string>:(.text+0x47): undefined reference to `PyEval_SaveThread'
<string>:(.text+0x53): undefined reference to `PyEval_RestoreThread'
<string>:(.text+0x62): undefined reference to `PyLong_FromLongLong'
<string>:(.text+0x74): undefined reference to `PyExc_RuntimeError'
<string>:(.text+0x88): undefined reference to `PyErr_SetString'

I'm suspecting that the Numba and/or the Python standard library need to be dynamically linked against the generated object file for this to run successfully but I'm not sure how it can be done (if it can even be done in the first place).

I've also tried the following where I write the intermediate LLVM code to the file instead of the assembly:

with open("six.ll", "w") as f:
    for k, v in six.inspect_llvm().items():
        f.write(v)

And then

$ lli six.ll

But this fails as well with the following error:

'main' function not found in module.

UPDATE:

It turns out that there exists a utility to find the relevant flags to pass to the ld command to dynamically link the Python standard library.

$ python3-config --ldflags

Returns

-L/Users/rayan/anaconda3/lib/python3.7/config-3.7m-darwin -lpython3.7m -ldl -framework CoreFoundation 

Running the following again, this time with the correct flags:

$ as -o six.o six.asm
$ ld six.o -o six.bin -L/Users/rayan/anaconda3/lib/python3.7/config-3.7m-darwin -lpython3.7m -ldl -framework CoreFoundation 
$ chmod +x six.bin
$ ./six.bin

I am now getting

ld: warning: No version-min specified on command line
ld: entry point (_main) undefined. for inferred architecture x86_64

I have tried adding a _main label in the assembly file but that doesn't seem to do anything. Any ideas on how to define the entry point?

UPDATE 2:

Here's the assembly code in case that's useful, it seems like the target function is the one with label _ZN8__main__7six$241E:

    .text
    .file   "<string>"
    .globl  _ZN8__main__7six$241E
    .p2align    4, 0x90
    .type   _ZN8__main__7six$241E,@function
_ZN8__main__7six$241E:
    movq    $6, (%rdi)
    xorl    %eax, %eax
    retq
.Lfunc_end0:
    .size   _ZN8__main__7six$241E, .Lfunc_end0-_ZN8__main__7six$241E

    .globl  _ZN7cpython8__main__7six$241E
    .p2align    4, 0x90
    .type   _ZN7cpython8__main__7six$241E,@function
_ZN7cpython8__main__7six$241E:
    .cfi_startproc
    pushq   %rax
    .cfi_def_cfa_offset 16
    movq    %rsi, %rdi
    movabsq $.const.six, %rsi
    movabsq $PyArg_UnpackTuple, %r8
    xorl    %edx, %edx
    xorl    %ecx, %ecx
    xorl    %eax, %eax
    callq   *%r8
    testl   %eax, %eax
    je  .LBB1_3
    movabsq $_ZN08NumbaEnv8__main__7six$241E, %rax
    cmpq    $0, (%rax)
    je  .LBB1_2
    movabsq $PyEval_SaveThread, %rax
    callq   *%rax
    movabsq $PyEval_RestoreThread, %rcx
    movq    %rax, %rdi
    callq   *%rcx
    movabsq $PyLong_FromLongLong, %rax
    movl    $6, %edi
    popq    %rcx
    .cfi_def_cfa_offset 8
    jmpq    *%rax
.LBB1_2:
    .cfi_def_cfa_offset 16
    movabsq $PyExc_RuntimeError, %rdi
    movabsq $".const.missing Environment", %rsi
    movabsq $PyErr_SetString, %rax
    callq   *%rax
.LBB1_3:
    xorl    %eax, %eax
    popq    %rcx
    .cfi_def_cfa_offset 8
    retq
.Lfunc_end1:
    .size   _ZN7cpython8__main__7six$241E, .Lfunc_end1-_ZN7cpython8__main__7six$241E
    .cfi_endproc

    .globl  cfunc._ZN8__main__7six$241E
    .p2align    4, 0x90
    .type   cfunc._ZN8__main__7six$241E,@function
cfunc._ZN8__main__7six$241E:
    movl    $6, %eax
    retq
.Lfunc_end2:
    .size   cfunc._ZN8__main__7six$241E, .Lfunc_end2-cfunc._ZN8__main__7six$241E

    .type   _ZN08NumbaEnv8__main__7six$241E,@object
    .comm   _ZN08NumbaEnv8__main__7six$241E,8,8
    .type   .const.six,@object
    .section    .rodata,"a",@progbits
.const.six:
    .asciz  "six"
    .size   .const.six, 4

    .type   ".const.missing Environment",@object
    .p2align    4
.const.missing Environment:
    .asciz  "missing Environment"
    .size   ".const.missing Environment", 20


    .section    ".note.GNU-stack","",@progbits
like image 620
Rayan Hatout Avatar asked May 08 '20 11:05

Rayan Hatout


People also ask

How do I use Numba to speed up my code?

Introducing Numba You don't need to do anything fancy with your Python code either. Just add a single line before the Python function you want to optimise and Numba will do the rest! If your code has a lot of numerical operations, uses Numpy a lot, and/or has a lot of loops, then Numba should give you a good speedup.

How does Numba work in Python?

Numba reads the Python bytecode for a decorated function and combines this with information about the types of the input arguments to the function. It analyzes and optimizes your code, and finally uses the LLVM compiler library to generate a machine code version of your function, tailored to your CPU capabilities.

Why is Numba not working?

Another common reason for Numba not being able to compile your code is that it cannot statically determine the return type of a function. The most likely cause of this is the return type depending on a value that is available only at runtime. Again, this is most often problematic when using nopython mode.

Is Numba faster than NumPy?

Numba is generally faster than Numpy and even Cython (at least on Linux). In this benchmark, pairwise distances have been computed, so this may depend on the algorithm.


1 Answers

After browsing [PyData.Numba]: Numba docs, and some debugging, trial and error, I reached to a conclusion: it seems you're off the path to your quest (as was also pointed out in comments).

Numba converts Python code (functions) to machine code (for the obvious reason: speed). It does everything (convert, build, insert in the running process) on the fly, the programmer only needs to decorate the function as e.g. @numba.jit ([PyData.Numba]: Just-in-Time compilation).

The behavior that you're experiencing is correct. The Dispatcher object (used by decorating the six function) only generates (assembly) code for the function itself (it's no main there, as the code is executing in the current process (Python interpreter's main function)). So, it's normal for the linker to complain there's no main symbol. It's like writing a C file that only contains:

int six() {
    return 6;
}

In order for things to work properly, you have to:

  1. Build the .asm file into an .o (object) file (done)

  2. Include the .o file from #1. into a library which can be

    • Static
    • Dynamic


    The library is to be linked in the (final) executable. This step is optional as you could use the .o file directly

  3. Build another file that defines main (and calls six - which I assume it's the whole purpose) into an .o file. As I'm not very comfortable with assembly, I wrote it in C

  4. Link the 2 entities (from #2. (#1.) and #3.) together

As an alternative, you could take a look at [PyData.Numba]: Compiling code ahead of time, but bear in mind that it would generate a Python (extension) module.

Back to the current problem. Did the test on Ubuntu 18.04 64bit.

code00.py:

#!/usr/bin/env python

import sys
import math
import numba


@numba.jit(nopython=True, nogil=True)
def six():
    return 6


def main(*argv):
    six()  # Call the function(s), otherwise `inspect_asm()` would return empty dict
    speed_funcs = [
        (six, numba.int32()),
    ]
    for func, _ in speed_funcs:
        file_name_asm = "numba_{0:s}_{1:s}_{2:03d}_{3:02d}{4:02d}{5:02d}.asm".format(func.__name__, sys.platform, int(round(math.log2(sys.maxsize))) + 1, *sys.version_info[:3])
        asm = func.inspect_asm()
        print("Writing to {0:s}:".format(file_name_asm))
        with open(file_name_asm, "wb") as fout:
            for k, v in asm.items():
                print("    {0:}".format(k))
                fout.write(v.encode())


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main(*sys.argv[1:])
    print("\nDone.")

main00.c:

#include <stdio.h>
#include <dlfcn.h>

//#define SYMBOL_SIX "_ZN8__main__7six$241E"
#define SYMBOL_SIX "cfunc._ZN8__main__7six$241E"

typedef int (*SixFuncPtr)();

int main() {
    void *pMod = dlopen("./libnumba_six_linux.so", RTLD_LAZY);
    if (!pMod) {
        printf("Error (%s) loading module\n", dlerror());
        return -1;
    }
    SixFuncPtr pSixFunc = dlsym(pMod, SYMBOL_SIX);
    if (!pSixFunc)
    {
        printf("Error (%s) loading function\n", dlerror());
        dlclose(pMod);
         return -2;
    }
    printf("six() returned: %d\n", (*pSixFunc)());
    dlclose(pMod);
    return 0;
}

build.sh:

CC=gcc

LIB_BASE_NAME=numba_six_linux

FLAG_LD_LIB_NUMBALINUX="-Wl,-L. -Wl,-l${LIB_BASE_NAME}"
FLAG_LD_LIB_PYTHON="-Wl,-L/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu -Wl,-lpython3.7m"

rm -f *.asm *.o *.a *.so *.exe

echo Generate .asm
python3 code00.py

echo Assemble
as -o ${LIB_BASE_NAME}.o ${LIB_BASE_NAME}_064_030705.asm

echo Link library
LIB_NUMBA="./lib${LIB_BASE_NAME}.so"
#ar -scr ${LIB_NUMBA} ${LIB_BASE_NAME}.o
${CC} -o ${LIB_NUMBA} -shared ${LIB_BASE_NAME}.o ${FLAG_LD_LIB_PYTHON}

echo Dump library contents
nm -S ${LIB_NUMBA}
#objdump -t ${LIB_NUMBA}

echo Compile and link executable
${CC} -o main00.exe main00.c -ldl

echo Exit script

Output:

(py_venv_pc064_03.07.05_test0) [cfati@cfati-ubtu-18-064-00:~/Work/Dev/StackOverflow/q061678226]> ~/sopr.sh
*** Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ***

[064bit prompt]>
[064bit prompt]> ls
build.sh  code00.py  main00.c
[064bit prompt]>
[064bit prompt]> ./build.sh
Generate .asm
Python 3.7.5 (default, Nov  7 2019, 10:50:52) [GCC 8.3.0] 64bit on linux

Writing to numba_six_linux_064_030705.asm:
    ()

Done.
Assemble
Link library
Dump library contents
0000000000201020 B __bss_start
00000000000008b0 0000000000000006 T cfunc._ZN8__main__7six$241E
0000000000201020 0000000000000001 b completed.7698
00000000000008e0 0000000000000014 r .const.missing Environment
00000000000008d0 0000000000000004 r .const.six
                 w __cxa_finalize
0000000000000730 t deregister_tm_clones
00000000000007c0 t __do_global_dtors_aux
0000000000200e58 t __do_global_dtors_aux_fini_array_entry
0000000000201018 d __dso_handle
0000000000200e60 d _DYNAMIC
0000000000201020 D _edata
0000000000201030 B _end
00000000000008b8 T _fini
0000000000000800 t frame_dummy
0000000000200e50 t __frame_dummy_init_array_entry
0000000000000990 r __FRAME_END__
0000000000201000 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
00000000000008f4 r __GNU_EH_FRAME_HDR
00000000000006f0 T _init
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U PyArg_UnpackTuple
                 U PyErr_SetString
                 U PyEval_RestoreThread
                 U PyEval_SaveThread
                 U PyExc_RuntimeError
                 U PyLong_FromLongLong
0000000000000770 t register_tm_clones
0000000000201020 d __TMC_END__
0000000000201028 0000000000000008 B _ZN08NumbaEnv8__main__7six$241E
0000000000000820 0000000000000086 T _ZN7cpython8__main__7six$241E
0000000000000810 000000000000000a T _ZN8__main__7six$241E
Compile and link executable
Exit script
[064bit prompt]>
[064bit prompt]> ls
build.sh  code00.py  libnumba_six_linux.so  main00.c  main00.exe  numba_six_linux_064_030705.asm  numba_six_linux.o
[064bit prompt]>
[064bit prompt]> # Run the executable
[064bit prompt]>
[064bit prompt]> ./main00.exe
six() returned: 6
[064bit prompt]>

Also posting (since it's important) numba_six_linux_064_030705.asm:

    .text
    .file   "<string>"
    .globl  _ZN8__main__7six$241E
    .p2align    4, 0x90
    .type   _ZN8__main__7six$241E,@function
_ZN8__main__7six$241E:
    movq    $6, (%rdi)
    xorl    %eax, %eax
    retq
.Lfunc_end0:
    .size   _ZN8__main__7six$241E, .Lfunc_end0-_ZN8__main__7six$241E

    .globl  _ZN7cpython8__main__7six$241E
    .p2align    4, 0x90
    .type   _ZN7cpython8__main__7six$241E,@function
_ZN7cpython8__main__7six$241E:
    .cfi_startproc
    pushq   %rax
    .cfi_def_cfa_offset 16
    movq    %rsi, %rdi
    movabsq $.const.six, %rsi
    movabsq $PyArg_UnpackTuple, %r8
    xorl    %edx, %edx
    xorl    %ecx, %ecx
    xorl    %eax, %eax
    callq   *%r8
    testl   %eax, %eax
    je  .LBB1_3
    movabsq $_ZN08NumbaEnv8__main__7six$241E, %rax
    cmpq    $0, (%rax)
    je  .LBB1_2
    movabsq $PyEval_SaveThread, %rax
    callq   *%rax
    movabsq $PyEval_RestoreThread, %rcx
    movq    %rax, %rdi
    callq   *%rcx
    movabsq $PyLong_FromLongLong, %rax
    movl    $6, %edi
    popq    %rcx
    .cfi_def_cfa_offset 8
    jmpq    *%rax
.LBB1_2:
    .cfi_def_cfa_offset 16
    movabsq $PyExc_RuntimeError, %rdi
    movabsq $".const.missing Environment", %rsi
    movabsq $PyErr_SetString, %rax
    callq   *%rax
.LBB1_3:
    xorl    %eax, %eax
    popq    %rcx
    .cfi_def_cfa_offset 8
    retq
.Lfunc_end1:
    .size   _ZN7cpython8__main__7six$241E, .Lfunc_end1-_ZN7cpython8__main__7six$241E
    .cfi_endproc

    .globl  cfunc._ZN8__main__7six$241E
    .p2align    4, 0x90
    .type   cfunc._ZN8__main__7six$241E,@function
cfunc._ZN8__main__7six$241E:
    movl    $6, %eax
    retq
.Lfunc_end2:
    .size   cfunc._ZN8__main__7six$241E, .Lfunc_end2-cfunc._ZN8__main__7six$241E

    .type   _ZN08NumbaEnv8__main__7six$241E,@object
    .comm   _ZN08NumbaEnv8__main__7six$241E,8,8
    .type   .const.six,@object
    .section    .rodata,"a",@progbits
.const.six:
    .asciz  "six"
    .size   .const.six, 4

    .type   ".const.missing Environment",@object
    .p2align    4
".const.missing Environment":
    .asciz  "missing Environment"
    .size   ".const.missing Environment", 20


    .section    ".note.GNU-stack","",@progbits

Notes:

  • numba_six_linux_064_030705.asm (and everything that derives from it) contain the code for the six function. Actually, there are a bunch of symbols (on OSX, you can also use the native otool -T) like:

    1. cfunc._ZN8__main__7six$241E - the (C) function itself

    2. _ZN7cpython8__main__7six$241E - the Python wrapper:

      1. Performs the C <=> Python conversions (via Python API functions like PyArg_UnpackTuple)
      2. Due to #1. it needs (depends on) libpython3.7m
      3. As a consequence, nopython=True has no effect in this case

    Also, the main part from these symbols doesn't refer to an executable entry point (main function), but to a Python module's top level namespace (__main__). After all, this code is supposed to be run from Python

  • Due to the fact that the C plain function contains a dot (.) in the name, I couldn't call it directly from C (as it's an invalid identifier name), so I had to load (the .so and) the function manually (dlopen / dlsym), resulting in more code than simply calling the function.
    I didn't try it, but I think it would make sense that the following (manual) changes to the generated .asm file would simplify the work:

    • Renaming the plain C function name (to something like __six, or any other valid C identifier that also doesn't clash with another (explicit or internal) name) in the .asm file before assembling it, would make the function directly callable from C
    • Removing the Python wrapper (#2.) would also get rid of #22.


Update #0

Thanks to @PeterCordes, who shared that exact piece of info ([GNU.GCC]: Controlling Names Used in Assembler Code) that I was missing, here's a much simpler version.

main01.c:

#include <stdio.h>

extern int six() asm ("cfunc._ZN8__main__7six$241E");

int main() {
    printf("six() returned: %d\n", six());
}

Output:

[064bit prompt]> # Resume from previous point + main01.c
[064bit prompt]>
[064bit prompt]> ls
build.sh  code00.py  libnumba_six_linux.so  main00.c  main00.exe  main01.c  numba_six_linux_064_030705.asm  numba_six_linux.o
[064bit prompt]>
[064bit prompt]> ar -scr libnumba_six_linux.a numba_six_linux.o
[064bit prompt]>
[064bit prompt]> gcc -o main01.exe main01.c ./libnumba_six_linux.a -Wl,-L/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu -Wl,-lpython3.7m
[064bit prompt]>
[064bit prompt]> ls
build.sh  code00.py  libnumba_six_linux.a  libnumba_six_linux.so  main00.c  main00.exe  main01.c  main01.exe  numba_six_linux_064_030705.asm  numba_six_linux.o
[064bit prompt]>
[064bit prompt]> ./main01.exe
six() returned: 6
[064bit prompt]>
like image 129
CristiFati Avatar answered Oct 14 '22 15:10

CristiFati