Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Python on multiple cores

I have created a (rather large) program that takes quite a long time to finish, and I started looking into ways to speed up the program.

I found that if I open task manager while the program is running only one core is being used.

After some research, I found this website: Why does multiprocessing use only a single core after I import numpy? which gives a solution of os.system("taskset -p 0xff %d" % os.getpid()), however this doesn't work for me, and my program continues to run on a single core.

I then found this: is python capable of running on multiple cores?, which pointed towards using multiprocessing.

So after looking into multiprocessing, I came across this documentary on how to use it https://docs.python.org/3/library/multiprocessing.html#examples

I tried the code:

from multiprocessing import Process

def f(name):
    print('hello', name)

if __name__ == '__main__':
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

a = input("Finished")

After running the code (not in IDLE) It said this:

Finished
hello bob
Finished

Note: after it said Finished the first time I pressed enter

So after this I am now even more confused and I have two questions

First: It still doesn't run with multiple cores (I have an 8 core Intel i7)

Second: Why does it input "Finished" before its even run the if statement code (and it's not even finished yet!)

like image 405
RulerOfTheWorld Avatar asked Aug 25 '17 18:08

RulerOfTheWorld


1 Answers

To answer your second question first, "Finished" is printed to the terminal because a = input("Finished") is outside of your if __name__ == '__main__': code block. It is thus a module level constant which gets assigned when the module is first loaded and will execute before any code in the module runs.

To answer the first question, you only created one process which you run and then wait to complete before continuing. This gives you zero benefits of multiprocessing and incurs overhead of creating the new process.

Because you want to create several processes, you need to create a pool via a collection of some sort (e.g. a python list) and then start all of the processes.

In practice, you need to be concerned with more than the number of processors (such as the amount of available memory, the ability to restart workers that crash, etc.). However, here is a simple example that completes your task above.

import datetime as dt
from multiprocessing import Process, current_process
import sys

def f(name):
    print('{}: hello {} from {}'.format(
        dt.datetime.now(), name, current_process().name))
    sys.stdout.flush()

if __name__ == '__main__':
    worker_count = 8
    worker_pool = []
    for _ in range(worker_count):
        p = Process(target=f, args=('bob',))
        p.start()
        worker_pool.append(p)
    for p in worker_pool:
        p.join()  # Wait for all of the workers to finish.

    # Allow time to view results before program terminates.
    a = input("Finished")  # raw_input(...) in Python 2.

Also note that if you join workers immediately after starting them, you are waiting for each worker to complete its task before starting the next worker. This is generally undesirable unless the ordering of the tasks must be sequential.

Typically Wrong

worker_1.start()
worker_1.join()

worker_2.start()  # Must wait for worker_1 to complete before starting worker_2.
worker_2.join()

Usually Desired

worker_1.start()
worker_2.start()  # Start all workers.

worker_1.join()
worker_2.join()   # Wait for all workers to finish.

For more information, please refer to the following links:

  • https://docs.python.org/3/library/multiprocessing.html
  • Dead simple example of using Multiprocessing Queue, Pool and Locking
  • https://pymotw.com/2/multiprocessing/basics.html
  • https://pymotw.com/2/multiprocessing/communication.html
  • https://pymotw.com/2/multiprocessing/mapreduce.html
like image 144
Alexander Avatar answered Sep 28 '22 18:09

Alexander