Multiprocessing for calculating eigen value

Question

I'm generating 100 random int matrices of size 1000x1000. I'm using the multiprocessing module to calculate the eigen values of the 100 matrices.

The code is given below:

import timeit
import numpy as np
import multiprocessing as mp

def calEigen():

 S, U = np.linalg.eigh(a)

def multiprocess(processes):
 pool = mp.Pool(processes=processes)
 #Start timing here as I don't want to include time taken to initialize the processes
 start = timeit.default_timer()
 results = [pool.apply_async(calEigen, args=())]
 stop = timeit.default_timer()
 print (processes":", stop - start) 

 results = [p.get() for p in results]
 results.sort() # to sort the results 


if __name__ == "__main__":

 global a
 a=[]

 for i in range(0,100):
  a.append(np.random.randint(1,100,size=(1000,1000)))

 #Print execution time without multiprocessing
 start = timeit.default_timer()
 calEigen()
 stop = timeit.default_timer()
 print stop - start 

 #With 1 process
 multiprocess(1)

 #With 2 processes
 multiprocess(2)

 #With 3 processes
 multiprocess(3)

 #With 4 processes
 multiprocess(4)

The output is

0.510247945786
('Process:', 1, 5.1021575927734375e-05)
('Process:', 2, 5.698204040527344e-05)
('Process:', 3, 8.320808410644531e-05)
('Process:', 4, 7.200241088867188e-05)

Another iteration showed this output:

 69.7296020985
 ('Process:', 1, 0.0009050369262695312)
 ('Process:', 2, 0.023727893829345703)
 ('Process:', 3, 0.0003509521484375)
 ('Process:', 4, 0.057518959045410156)

My questions are these:

Why doesn't the time execution time reduce as the number of processes increase? Am I using the multiprocessing module correctly?
Am I calculating the execution time correctly?

I have edited the code given in the comments below. I want the serial and multiprocessing functions to find the eigen values for the same list of 100 matrices. The edited code is-

import numpy as np
import time
from multiprocessing import Pool

a=[]

for i in range(0,100):
 a.append(np.random.randint(1,100,size=(1000,1000)))

def serial(z):
 result = []
 start_time = time.time()
 for i in range(0,100):    
  result.append(np.linalg.eigh(z[i])) #calculate eigen values and append to result list
 end_time = time.time()
 print("Single process took :", end_time - start_time, "seconds")


def caleigen(c):  
 result = []        
 result.append(np.linalg.eigh(c)) #calculate eigenvalues and append to result list
 return result

def mp(x):
 start_time = time.time()
 with Pool(processes=x) as pool:  # start a pool of 4 workers
  result = pool.map_async(caleigen,a)   # distribute work to workers
  result = result.get() # collect result from MapResult object
 end_time = time.time()
 print("Mutltiprocessing took:", end_time - start_time, "seconds" )

if __name__ == "__main__":

 serial(a)
 mp(1,a)
 mp(2,a)
 mp(3,a)
 mp(4,a)

There is no reduction in the time as the number of processes increases. Where am I going wrong? Does multiprocessing divide the list into chunks for the processes or do I have to do the division?

user3667217 · Accepted Answer

You're not using the multiprocessing module correctly. As @dopstar pointed out, you're not dividing your task. There is only one task for the process pool, so not matter how many workers you assigned, only one will get the job. As for your second question, I didn't use timeit to measure process time precisely. I just use time module to get a crude sense of how fast things are. It serves the purpose most of the time, though. If I understand what you're trying to do correctly, this should be the single process version of your code

import numpy as np
import time

result = []
start_time = time.time()
for i in range(100):
    a = np.random.randint(1, 100, size=(1000,1000))  #generate random matrix
    result.append(np.linalg.eigh(a))                 #calculate eigen values and append to result list
end_time = time.time()
print("Single process took :", end_time - start_time, "seconds")

The single process version took 15.27 seconds on my computer. Below is the multiprocess version, which took only 0.46 seconds on my computer. I also included the single process version for comparison. (The single process version has to be enclosed in the if block as well and placed after the multiprocess version.) Because you would like to repeat your calculation for 100 times, it'd be a lot easier to create a pool of workers and let them take on unfinished task automatically than to manually start each process and specify what each process should do. Here in my codes, the argument for the caleigen call is merely to keep track of how many times the task has been executed. Finally, map_async is generally faster than apply_async, with its downside being consuming slightly more memory and taking only one argument for function call. The reason for using map_async but not map is that in this case, the order in which result is returned does not matter and map_async is much faster than map.

from multiprocessing import Pool
import numpy as np
import time

def caleigen(x):     # define work for each worker
    a = np.random.randint(1,100,size=(1000,1000))   
    S, U = np.linalg.eigh(a)                        
    return S, U


if __name__ == "main":
    start_time = time.time()
    with Pool(processes=4) as pool:      # start a pool of 4 workers
        result = pool.map_async(caleigen, range(100))   # distribute work to workers
        result = result.get()        # collect result from MapResult object
    end_time = time.time()
    print("Mutltiprocessing took:", end_time - start_time, "seconds" )

    # Run the single process version for comparison. This has to be within the if block as well. 
    result = []
    start_time = time.time()
    for i in range(100):
        a = np.random.randint(1, 100, size=(1000,1000))  #generate random matrix
        result.append(np.linalg.eigh(a))                 #calculate eigen values and append to result list
    end_time = time.time()
    print("Single process took :", end_time - start_time, "seconds")

Multiprocessing for calculating eigen value

Tags:

python

python-multiprocessing

Misha

1 Answers

user3667217

Recent Activity

Donate For Us

Multiprocessing for calculating eigen value

Tags:

python

python-multiprocessing

Misha

1 Answers

user3667217

Related questions

Recent Activity

Donate For Us