I wrote a function using multiprocessing packages from python and tried to boost the speed of my code.
from arch.univariate import ARX, GARCH
from multiprocessing import Process
import multiprocessing
import time
def batch_learning(X, lag_array=None):
"""
X is a time series array
lag_array contains all possible lag numbers
"""
# init a queue used for triggering different processes
queue = multiprocessing.JoinableQueue()
data = multiprocessing.Queue()
# a worker called ARX_fit triggered by queue.get()
def ARX_fit(queue):
while True:
q = queue.get()
q.volatility = GARCH()
print "Starting to fit lags %s" %str(q.lags.size/2)
try:
q_res=q.fit(update_freq=500)
except:
print "Error:...."
print "finished lags %s" %str(q.lags.size/2)
queue.task_done()
# init four processes
for i in range(4):
process_i = Process(target=ARX_fit, name="Process_%s"%str(i), args=(queue,))
process_i.start()
# put ARX model objects into queue continuously
for num in lag_array:
queue.put(ARX(X, lags=num))
# sync processes here
queue.join()
return
After calling function:
batch_learning(a, lag_array=range(1,10))
However it got stuck in the middle and I got the print out messages as below:
Starting to fit lags 1
Starting to fit lags 3
Starting to fit lags 2
Starting to fit lags 4
finished lags 1
finished lags 2
Starting to fit lags 5
finished lags 3
Starting to fit lags 6
Starting to fit lags 7
finished lags 4
Starting to fit lags 8
finished lags 6
finished lags 5
Starting to fit lags 9
It runs forever but without any printouts on my Mac OS El Captain. Then using PyCharm debug mode and thanks for Tim Peters suggestions, I successfully find out that the processes actually quitted unexpectedly. Under debug mode, I can pinpoint it is actually svd function inside numpy.linalg.pinv() used by arch library causing this problem. Then my question is: Why? It works with single process for-loop but it cannot work with 2 processes or above. I don't know how to fix this problem. Is it a numpy bug? Can anyone help me a bit here?
I have to answer this question by myself and providing my solutions. I have already solved this issue, thanks to the help from @Tim Peters and @aganders.
The multiprocessing usually hangs when you use numpy/scipy libraries on Mac OS because of the Accelerate Framework used in Apple OS which is a replacement for OpenBlas numpy is built on. Simply, in order to solve the similar problem, you have to do as follows:
follow the procedure on this link to rebuild numpy with Openblas.
reinstall scipy and test your code to see if it works.
Some heads up for testing your multiprocessing codes on Mac OS, when you run your code, it is better to set up a env variable to run your code:
OPENBLAS_NUM_THREADS=1 python import_test.py
The reason for doing this is that OpenBlas by default create 2 threads for each core to run, in which case there are 8 threads running (2 for each core) even though you set up 4 processes. This creates a bit overhead for the thread switching. I tested OPENBLAS_NUM_THREADS=1 config to limit 1 thread each process on each core, it is indeed faster than default settings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With