Python code with multiprocessing is slower with 32 cores than 16 cores on AWS EC2

Question

I don’t understand why time of my calculation’s is longer while I using 28-30 cores than when I use 12-16 cores on AWS EC2 c3.8xlarge. I did some test and result are in chart below:

https://www.dropbox.com/s/8u32jttxmkvnacd/Slika%20zaslona%202015-01-11%20u%2018.33.20.png?dl=0

The fastest computation is when I use 13 cores. So if i use maximal cores, it is same time as i am using 8 cores of c3.8xlarge:

https://www.dropbox.com/s/gf3bevbi8dwk5vh/Slika%20zaslona%202015-01-11%20u%2018.32.53.png?dl=0

This is simplified code of code that i use.

import random
import multiprocessing as mp
import threading as th
import numpy as np

x=mp.Value('f',0)
y=mp.Value('f',0)
arr=[]
tasks=[]
nesto=[]

def calculation2(some_array):
    global x, y, arr
    p=False
    a = np.sum(some_array)*random.random()
    b = a **(random.random())
    if a > x.value:
        x.value=a
        y.value=b
        arr=some_array
        p=True
    if p:
        return x.value, y.value, arr

def calculation1(number_of_pool):
    global tasks
    pool=mp.Pool(number_of_pool)
    for i in range(1,500):
        some_array=np.random.randint(100, size=(1, 4))
        tasks+=[pool.apply_async(calculation2,args=(some_array,))]

def exec_activator():
    global x, y, arr
    while tasks_gen.is_alive() or len(tasks)>0:
        try:
            task=tasks.pop(0)
            x.value, y.value, arr = task.get()
        except:
            pass

def results(task_act):
    while task_act.is_alive():
        pass
    else:
        print x.value
        print y.value
        print arr

tasks_gen=th.Thread(target=calculation1,args=(4,))
task_act=th.Thread(target=exec_activator)
result_print=th.Thread(target=results,args=(task_act,))

tasks_gen.start()
task_act.start()
result_print.start()

It’s core are 2 calculation’s:

calculation 1 - computing array and making jobs for calculation 2 with that array
calculation 2 - computing some calculation’s out of array and compare of results

The goal of code is to find array that compute maximum x, and return its y. The two calculations start simultaneously (with threading) because sometimes there are too many array’s that take up too much RAM.

My goal is to do the fastest computation. I need advice how to use all cores if possible.

Sorry in advance if bad english. If you need more information be please to ask.

Julio Iglesias · Accepted Answer

The c3.8xlarge is an Ivy Bridge quad core system. It uses Hyper-Threading; it doesn't really have 32 (hardware)independent processing units.

There's often no point in trying to parallelism a CPU bound task across more OS processes than what their are processors in the hardware. In fact, quite often it's detrimental due to the resource overhead and context switching (which is what you're seeing).

It likely depends on your specific applications, and experimentation will help you find the sweet spot (which it sounds like you've done).

Python code with multiprocessing is slower with 32 cores than 16 cores on AWS EC2

Tags:

python

multiprocessing

amazon-ec2

matoliki

1 Answers

Julio Iglesias

Recent Activity

Donate For Us

Python code with multiprocessing is slower with 32 cores than 16 cores on AWS EC2

Tags:

python

multiprocessing

amazon-ec2

matoliki

1 Answers

Julio Iglesias

Related questions

Recent Activity

Donate For Us