How to use Python and OpenCV with multiprocessing?

Tags:

I'm using Python 3.4.3 and OpenCV 3.0.0 to process (applying various filters to) a very large image (80,000 x 60,000) in memory and I'd like to use multiple CPU cores to improve performance. After some reading, I arrived at two possible method : 1) Use python's multiprocessing module, let each process deal with a slice of the large image and join the results after processing is done (And this probably should be performed on POSIX system?) 2) Since NumPy supports OpenMP and OpenCV uses NumPy, I can just leave the multiprocessing to NumPy?

So my question is :

Which one will be a better solution? (If they don't seem reasonable, what would be a possible approach? )

If Option 2 is good, should I build both NumPy and OpenCV with OpenMP ? How would I actually make multi-processing happen? ( I couldn't really find useful instruction..)

626

asked Sep 25 '15 05:09

user3667217

3 Answers

After reading some SO posts, I've come up with a way to use OpenCV in Python3 with multiprocessing. I recommend doing this on linux, because according to this post, spawned processes share memory with their parent as long as the content is not changed. Here's a minimal example:

import cv2
import multiprocessing as mp
import numpy as np
import psutil

img = cv2.imread('test.tiff', cv2.IMREAD_ANYDEPTH) # here I'm using a indexed 16-bit tiff as an example.
num_processes = 4
kernel_size = 11
tile_size = img.shape[0]/num_processes  # Assuming img.shape[0] is divisible by 4 in this case

output = mp.Queue()

def mp_filter(x, output):
    print(psutil.virtual_memory())  # monitor memory usage
    output.put(x, cv2.GaussianBlur(img[img.shape[0]/num_processes*x:img.shape[0]/num_processes*(x+1), :], 
               (kernel_size, kernel_size), kernel_size/5))
    # note that you actually have to process a slightly larger block and leave out the border.

if __name__ == 'main':
    processes = [mp.Process(target=mp_filter, args=(x, output)) for x in range(num_processes)]

    for p in processes:
        p.start()

    result = []
    for ii in range(num_processes):
        result.append(output.get(True))

    for p in processes:
        p.join()

Instead of using Queue, another way to collect the result from the processes is to create a shared array through multiprocessing module. (Has to import ctypes)

result = mp.Array(ctypes.c_uint16, img.shape[0]*img.shape[1], lock = False)

Then each process can write to different portions of the array assuming there is no overlap. Creating a large mp.Array is surprisingly slow, however. This actually defies the purpose of speeding up the operation. So use it only when the added time is not much when compared with total computation time. This array can be turned into a numpy array by :

result_np = np.frombuffer(result, dtypye=ctypes.c_uint16)

answered Oct 22 '22 13:10

user3667217

I don't know what types of filters you need, but if it's reasonably simple, you could consider libvips. It's an image processing system for very large images (larger than the amount of memory you have). It came out of a series of EU-funded scientific art imaging projects, so the focus is on the types of operation you need for image capture and comparison: convolution, rank, morphology, arithmetic, colour analysis, resampling, histograms, and so on.

It's fast (faster than OpenCV, on some benchmarks at least), needs little memory, and there's a high-level Python binding. It works on Linux, OS X and Windows. It handles all the multiprocessing for you automatically.

answered Oct 22 '22 14:10

jcupitt

This can be done cleanly with Ray, which is a library for parallel and distributed Python. Ray reasons about "tasks" instead of using a fork-join model, which gives some additional flexibility (e.g., you an put values in shared memory even after forking worker processes), the same code runs on multiple machines, you can compose tasks together, etc.

import cv2
import numpy as np
import ray

num_tasks = 4
kernel_size = 11


@ray.remote
def mp_filter(image, i):
    lower = image.shape[0] // num_tasks * i
    upper = image.shape[0] // num_tasks * (i + 1)
    return cv2.GaussianBlur(image[lower:upper, :],
                            (kernel_size, kernel_size), kernel_size // 5)


if __name__ == '__main__':
    ray.init()

    # Load the image and store it once in shared memory.
    image = np.random.normal(size=(1000, 1000))
    image_id = ray.put(image)

    result_ids = [mp_filter.remote(image_id, i) for i in range(num_tasks)]
    results = ray.get(result_ids)

Note that you can store more than just numpy arrays in shared memory, you can also benefit if you have Python objects that contain numpy arrays (like dictionaries containing numpy arrays). Under the hood, this uses the Plasma shared-memory object store and the Apache Arrow data layout.

You can read more in the Ray documentation. Note that I'm one of the Ray developers.

answered Oct 22 '22 13:10

Robert Nishihara

Related questions
                            
                                I need the server to send messages to all clients (Python, sockets)
                            
                                Issue when running schedule with Flask
                            
                                How do I save a workbook using xlwings?
                            
                                What should I use instead of Bootstrap?
                            
                                Filling date gaps in pandas dataframe
                            
                                MATLAB ksdensity equivalent in Python
                            
                                Pandas scalar value getting and setting: ix or iat?
                            
                                Python: Iterate over each item in nested-list-of-lists and replace specific items
                            
                                Why does this solve the 'no $DISPLAY environment' issue with matplotlib?
                            
                                Updating Anaconda's root Python to newer minor version on Windows does nothing
                            
                                Pandas, groupby where column value is greater than x
                            
                                Is it possible to run only a single step of the asyncio event loop
                            
                                How do i plot facet plots in pandas
                            
                                Can you use a concept similar to keyword args for python in Java to minimize the number of accessor methods?
                            
                                PyCharm show full diff when unittest fails for multiline string?
                            
                                PyMC3 & Theano - Theano code that works stop working after pymc3 import
                            
                                PySpark, importing schema through JSON file
                            
                                How to get the underlying socket when using Python requests
                            
                                hashing different tuples in python give identical result
                            
                                How to build a get-form post in flask

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use Python and OpenCV with multiprocessing?

Tags:

python

image-processing

opencv

parallel-processing

openmp

user3667217

People also ask

3 Answers

user3667217

jcupitt

Robert Nishihara

Recent Activity

Donate For Us