Python multiprocessing numpy.linalg.pinv cause segfault

Question

I wrote a function using multiprocessing packages from python and tried to boost the speed of my code.

from arch.univariate import ARX, GARCH
from multiprocessing import Process
import multiprocessing
import time

def batch_learning(X, lag_array=None):
    """
    X is a time series array
    lag_array contains all possible lag numbers
    """
    # init a queue used for triggering different processes
    queue = multiprocessing.JoinableQueue()
    data = multiprocessing.Queue()

    # a worker called ARX_fit triggered by queue.get()
    def ARX_fit(queue):
        while True:
            q = queue.get()
            q.volatility = GARCH()
            print "Starting to fit lags %s" %str(q.lags.size/2)
            try:
                q_res=q.fit(update_freq=500)
            except:
                print "Error:...."
            print "finished lags %s" %str(q.lags.size/2)
            queue.task_done()
    # init four processes
    for i in range(4):
        process_i = Process(target=ARX_fit, name="Process_%s"%str(i),   args=(queue,))
        process_i.start()
    # put ARX model objects into queue continuously
    for num in lag_array:
        queue.put(ARX(X, lags=num))

    # sync processes here
    queue.join()   

    return

After calling function:

batch_learning(a, lag_array=range(1,10))

However it got stuck in the middle and I got the print out messages as below:

Starting to fit lags 1
Starting to fit lags 3
Starting to fit lags 2
Starting to fit lags 4
finished lags 1
finished lags 2
Starting to fit lags 5
finished lags 3
Starting to fit lags 6
Starting to fit lags 7
finished lags 4
Starting to fit lags 8
finished lags 6
finished lags 5
Starting to fit lags 9

It runs forever but without any printouts on my Mac OS El Captain. Then using PyCharm debug mode and thanks for Tim Peters suggestions, I successfully find out that the processes actually quitted unexpectedly. Under debug mode, I can pinpoint it is actually svd function inside numpy.linalg.pinv() used by arch library causing this problem. Then my question is: Why? It works with single process for-loop but it cannot work with 2 processes or above. I don't know how to fix this problem. Is it a numpy bug? Can anyone help me a bit here?

Gauss Lee · Accepted Answer

I have to answer this question by myself and providing my solutions. I have already solved this issue, thanks to the help from @Tim Peters and @aganders.

The multiprocessing usually hangs when you use numpy/scipy libraries on Mac OS because of the Accelerate Framework used in Apple OS which is a replacement for OpenBlas numpy is built on. Simply, in order to solve the similar problem, you have to do as follows:

uninstall numpy and scipy (scipy needs to be matched with proper version of numpy)
follow the procedure on this link to rebuild numpy with Openblas.
reinstall scipy and test your code to see if it works.

Some heads up for testing your multiprocessing codes on Mac OS, when you run your code, it is better to set up a env variable to run your code:

OPENBLAS_NUM_THREADS=1 python import_test.py

The reason for doing this is that OpenBlas by default create 2 threads for each core to run, in which case there are 8 threads running (2 for each core) even though you set up 4 processes. This creates a bit overhead for the thread switching. I tested OPENBLAS_NUM_THREADS=1 config to limit 1 thread each process on each core, it is indeed faster than default settings.

Python multiprocessing numpy.linalg.pinv cause segfault

Tags:

python

numpy

python-multiprocessing

statsmodels

Gauss Lee

1 Answers

Gauss Lee

Recent Activity

Donate For Us

Python multiprocessing numpy.linalg.pinv cause segfault

Tags:

python

numpy

python-multiprocessing

statsmodels

Gauss Lee

1 Answers

Gauss Lee

Related questions

Recent Activity

Donate For Us