Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird multiprocessing block importing Numba function

Tags:

python

numba

Environment

  • GNU/Linux (Fedora 25).
  • Conda environment.
  • Python 3.6.1.
  • Numba 0.33.0 (np112py36_0).

Initial setup (works fine)

Two files main.py and numbamodule.py:

main.py

Which spawns 2 processes to run the execute_numba function.

import time
from importlib import import_module
from multiprocessing import Process


def execute_numba(name):
    # Import the function
    importfunction = 'numbamodule.numba_function'
    module = import_module(importfunction.split('.')[0])
    function = getattr(module, importfunction.split('.')[-1])
    while True:
        print(str(name) + ' - executing Numba function...')
        # Execute the function
        function(10)
        time.sleep(0.1)


if __name__ == '__main__':
    processes = [Process(target=execute_numba, args=(i,)) for i in range(2)]
    [p.start() for p in processes]
    time.sleep(1)
    [p.terminate() for p in processes]

numbamodule.py

Which defines a simple function numba_function:

import numba


@numba.jit()
def numba_function(x):
    total = 0
    for i in range(x):
        total += i
    return total

I can run the main.py script and see both processes printing:

$ python main.py
0 - executing Numba function...
1 - executing Numba function...
0 - executing Numba function...
1 - executing Numba function...
0 - executing Numba function...
1 - executing Numba function...
[...]

Breaking it

The way I break it is a bit weird, but this is what I stumbled upon when trying to minimize a reproducible test case. Please, tell me if you can reproduce the same behavior too.

In main.py I just add one of the proposed (bellow) imports after the last Process import (i.e.: uncomment one line and try):

import time
from importlib import import_module
from multiprocessing import Process

#
# Adding one of the import lines bellow results in a block...
# (you may need to install the packages first in the virtual environment)
#
#import matplotlib
#import Pyro4
#import scipy
#import dill


def execute_numba(name):
# [...]

Then one process may block at execute_numba function (in particular at the import_module() call):

$ python main.py 
1 - executing Numba function...
1 - executing Numba function...
1 - executing Numba function...
1 - executing Numba function...
1 - executing Numba function...
1 - executing Numba function...
[...]

For me, matplotlib and Pyro4 imports "work" the best. I cannot even get the block a 100% of the runs... :-/

Note that I am simply adding a single import line, not actually using the package. Some other external imports result in a block as well, but I have found that the ones proposed above "work" best (block the most).

What is happening?

First of all, can you reproduce the same behavior? (specially interested in non-virtualized GNU/Linux machines)

I don't know how to debug this or why could this be happening. Any ideas?

The fact that adding one random import xxx triggers the block scares me and makes little sense to me. Could this be dependent on timing/delays and that is why some imports break it and some others do not?

Notes

  • As you can see there is no traceback, the process just blocks.
  • If I remove the import numba and @numba.jit from numbamodule.py, then it will always work, so maybe it has something to do with Numba?
  • I can reproduce the same behavior also with older Numba/Python versions. Tried with Numba 0.25.0 and 0.22.1 (both with Python 3.5.3).

Updates

  • 2017-07-03: Just to make it clear, I am not looking for a workaround (I already have one in the real code). I am genuinely interested in knowing how to proceed in a case like this. Understand what is going on and learn how to debug and find the problem in order to report it if it is a broken package/build/environment. How would you proceed?
  • 2017-07-10: The block occurs in particular at the import_module() call.
  • 2017-07-11: Numba issue acknowledged.
like image 436
Peque Avatar asked Jun 26 '17 16:06

Peque


People also ask

Does Numba work with multiprocessing?

You should not use the multiprocessing package in a Numba code. This will simply not work (Numba will use a fallback implementation which is the basic Python one).

How does Python handle multiprocessing?

While using multiprocessing in Python, Pipes acts as the communication channel. Pipes are helpful when you want to initiate communication between multiple processes. They return two connection objects, one for each end of the Pipe, and use the send() & recv() methods to communicate.


1 Answers

It seems it was a Numba bug, acknowledged in issue 2431.

It seems to be fixed now. If you bump into this, update your numba and llvmlite installations. If that does not fix the problem, you probably should add a comment in that issue to reopen it.

As @stuartarchibald commented:

[...] it looks like one processed is blocked is because it has in actual fact segfaulted [...]

[...] Segfaults appearing from this location are almost always due to threads performing concurrent operations inside LLVM, or some issue to do with installing functions during Numba's initialisation sequence. [...]

[...] cannot reproduce any more with llvmlite==0.22.0dev0 and numba==0.37.0.dev [...]

like image 109
Peque Avatar answered Oct 22 '22 16:10

Peque