Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing doesn't seem to use more than one core

I want to use Python multiprocessing to run grid search for a predictive model. When I look at core usage, it always seem to be using only one core. Any idea what I'm doing wrong?

import multiprocessing from sklearn import svm import itertools  #first read some data #X will be my feature Numpy 2D array #y will be my 1D Numpy array of labels  #define the grid         C = [0.1, 1] gamma = [0.0] params = [C, gamma] grid = list(itertools.product(*params)) GRID_hx = []  def worker(par, grid_list):     #define a sklearn model     clf = svm.SVC(C=g[0], gamma=g[1],probability=True,random_state=SEED)     #run a cross validation function: returns error     ll = my_cross_validation_function(X, y, model=clf, n=1, test_size=0.2)     print(par, ll)     grid_list.append((par, ll))   if __name__ == '__main__':    manager = multiprocessing.Manager()    GRID_hx = manager.list()    jobs = []    for g in grid:       p = multiprocessing.Process(target=worker, args=(g,GRID_hx))       jobs.append(p)       p.start()       p.join()     print("\n-------------------")    print("SORTED LIST")    print("-------------------")    L = sorted(GRID_hx, key=itemgetter(1))    for l in L[:5]:       print l 
like image 561
ADJ Avatar asked Apr 22 '15 15:04

ADJ


People also ask

Does Python multiprocessing use multiple cores?

Key Takeaways. Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

Does multiprocessing require multiple cores?

Common research programming languages use only one processor The “multi” in multiprocessing refers to the multiple cores in a computer's central processing unit (CPU). Computers originally had only one CPU core or processor, which is the unit that makes all our mathematical calculations possible.

Is multiprocessing a good idea in Python?

An excellent solution is to use multiprocessing, rather than multithreading, where work is split across separate processes, allowing the operating system to manage access to shared resources. This also gets around one of the notorious Achilles Heels in Python: the Global Interpreter Lock (aka theGIL).


1 Answers

Your problem is that you join each job immediately after you started it:

for g in grid:     p = multiprocessing.Process(target=worker, args=(g,GRID_hx))     jobs.append(p)     p.start()     p.join() 

join blocks until the respective process has finished working. This means that your code starts only one process at once, waits until it is finished and then starts the next one.

In order for all processes to run in parallel, you need to first start them all and then join them all:

jobs = [] for g in grid:     p = multiprocessing.Process(target=worker, args=(g,GRID_hx))     jobs.append(p)     p.start()  for j in jobs:     j.join() 

Documentation: link

like image 81
helmbert Avatar answered Sep 19 '22 19:09

helmbert