Multithreading inside Multiprocessing in Python

Question

I am using concurrent.futures module to do multiprocessing and multithreading. I am running it on a 8 core machine with 16GB RAM, intel i7 8th Gen processor. I tried this on Python 3.7.2 and even on Python 3.8.2

import concurrent.futures
import time
#### takes list and multiply each elem by 2
def double_value(x):
  y = []
  for elem in x:
    y.append(2 *elem)
  return y
#### multiply an elem by 2
def double_single_value(x):
  return 2* x

#### define a

import numpy as np
a = np.arange(100000000).reshape(100, 1000000)

#### function to run multiple thread and multiple each elem by 2
 def get_double_value(x):
  with concurrent.futures.ThreadPoolExecutor() as executor:
    results = executor.map(double_single_value, x)
  return list(results)

### code shown below ran in 115 seconds. This is using only multiprocessing. CPU utilization for this piece of code is 100%

t = time.time()

with concurrent.futures.ProcessPoolExecutor() as executor:
  my_results = executor.map(double_value, a)
print(time.time()-t)

### Below function took more than 9 min and consumed all the Ram of system and then system kill all the process. Also CPU utilization during this piece of code is not upto 100% (~85%)

t = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
  my_results = executor.map(get_double_value, a)

print(time.time()-t)

I really want to understand:

why the code that first split do multiple processing and then run tried multi-threading is not running faster than the code that runs only multiprocessing ? (I have gone through many post that describe multiprocessing and multi-threading and one of the crux that I got is multi-threading is for I/O process and multiprocessing for CPU processes ? )
Is there any better way of doing multi-threading inside multiprocessing for max utilization of allotted core(or CPU) ?
Why that last piece of code consumed all the RAM ? Was it due to multi-threading ?

Lucas Vazquez · Accepted Answer

You can mix concurrency with parallelism. Why? You can have your valid reasons. Imagine a bunch of requests you have to make while processing their responses (e.g., converting XML to JSON) as fast as possible.

I did some tests and here are the results. In each test, I mix different workarounds to make a print 16000 times (I have 8 cores and 16 threads).

Parallelism with `multiprocessing`, concurrency with `asyncio`

The fastest, 1.1152372360229492 sec.

import asyncio
import multiprocessing
import os
import psutil
import threading
import time

async def print_info(value):
    await asyncio.sleep(1)
    print(
        f"THREAD: {threading.get_ident()}",
        f"PROCESS: {os.getpid()}",
        f"CORE_ID: {psutil.Process().cpu_num()}",
        f"VALUE: {value}",
    )

async def await_async_logic(values):
    await asyncio.gather(
        *(
            print_info(value)
            for value in values
        )
    )

def run_async_logic(values):
    asyncio.run(await_async_logic(values))

def multiprocessing_executor():
    start = time.time()
    with multiprocessing.Pool() as multiprocessing_pool:
        multiprocessing_pool.map(
            run_async_logic,
            (range(1000 * x, 1000 * (x + 1)) for x in range(os.cpu_count())),
        )
    end = time.time()
    print(end - start)

multiprocessing_executor()

Very important note: with asyncio I can spam tasks as much as I want. For example, I can change the value from 1000 to 10000 to generate 160000 prints and there is no problem (I tested it and it took me 2.0210490226745605 sec).

Parallelism with `multiprocessing`, concurrency with `threading`

An alternative option, 1.6983509063720703 sec.

import multiprocessing
import os
import psutil
import threading
import time

def print_info(value):
    time.sleep(1)
    print(
        f"THREAD: {threading.get_ident()}",
        f"PROCESS: {os.getpid()}",
        f"CORE_ID: {psutil.Process().cpu_num()}",
        f"VALUE: {value}",
    )

def multithreading_logic(values):
    threads = []
    for value in values:
        threads.append(threading.Thread(target=print_info, args=(value,)))
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()

def multiprocessing_executor():
    start = time.time()
    with multiprocessing.Pool() as multiprocessing_pool:
        multiprocessing_pool.map(
            multithreading_logic,
            (range(1000 * x, 1000 * (x + 1)) for x in range(os.cpu_count())),
        )
    end = time.time()
    print(end - start)

multiprocessing_executor()

Very important note: with this method I can NOT spam as many tasks as I want. If I change the value from 1000 to 10000 I get RuntimeError: can't start new thread. I also want to say that I am impressed because I thought that this method would be better in every aspect compared to asyncio, but quite the opposite.

Parallelism and concurrency with `concurrent.futures`

Extremely slow, 50.08251595497131 sec.

import os
import psutil
import threading
import time
from concurrent.futures import thread, process

def print_info(value):
    time.sleep(1)
    print(
        f"THREAD: {threading.get_ident()}",
        f"PROCESS: {os.getpid()}",
        f"CORE_ID: {psutil.Process().cpu_num()}",
        f"VALUE: {value}",
    )

def multithreading_logic(values):
    with thread.ThreadPoolExecutor() as multithreading_executor:
        multithreading_executor.map(
            print_info,
            values,
        )

def multiprocessing_executor():
    start = time.time()
    with process.ProcessPoolExecutor() as multiprocessing_executor:
        multiprocessing_executor.map(
            multithreading_logic,
            (range(1000 * x, 1000 * (x + 1)) for x in range(os.cpu_count())),
        )
    end = time.time()
    print(end - start)

multiprocessing_executor()

Very important note: with this method, as with asyncio, I can spam as many tasks as I want. For example, I can change the value from 1000 to 10000 to generate 160000 prints and there is no problem (except for the time).

Extra notes

To make this comment, I modified the test so that it only makes 1600 prints (modifying the 1000 value with 100 in each test).

When I remove the parallelism from asyncio, the execution takes me 16.090194702148438 sec. In addition, if I replace the await asyncio.sleep(1) with time.sleep(1), it takes 160.1889989376068 sec.

Removing the parallelism from the multithreading option, the execution takes me 16.24941658973694 sec. Right now I am impressed. Multithreading without multiprocessing gives me good performance, very similar to asyncio.

Removing parallelism from the third option, execution takes me 80.15227723121643 sec.

lenik · Answer

As you say: "I have gone through many post that describe multiprocessing and multi-threading and one of the crux that I got is multi-threading is for I/O process and multiprocessing for CPU processes".

You need to figure out, if your program is IO-bound or CPU-bound, then apply the correct method to solve your problem. Applying various methods at random or all together at the same time usually makes things only worse.

Multithreading inside Multiprocessing in Python

Tags:

python

multithreading

multiprocessing

concurrent.futures

learner

2 Answers

Parallelism with `multiprocessing`, concurrency with `asyncio`

Parallelism with `multiprocessing`, concurrency with `threading`

Parallelism and concurrency with `concurrent.futures`

Extra notes

Lucas Vazquez

lenik

Recent Activity

Donate For Us

Multithreading inside Multiprocessing in Python

Tags:

python

multithreading

multiprocessing

concurrent.futures

learner

2 Answers

Parallelism with multiprocessing, concurrency with asyncio

Parallelism with multiprocessing, concurrency with threading

Parallelism and concurrency with concurrent.futures

Extra notes

Lucas Vazquez

lenik

Related questions

Recent Activity

Donate For Us

Parallelism with `multiprocessing`, concurrency with `asyncio`

Parallelism with `multiprocessing`, concurrency with `threading`

Parallelism and concurrency with `concurrent.futures`