I am trying to convert one of my programs to use multiprocessing, preferably the multiprocessing pools since those seem simpler to do. At a high level the process is creating an array of patches from images and then passing them to the GPU for object detection. The CPU and GPU part take about 4s each, however the CPU has 8 cores and it doesn't have to wait for the GPU because no further operations are done to the data after it passes the GPU.
Here is a diagram of how I imagine this should work:

To help the process along I would like a demonstration with a high level version of my implementation. Say we are looping through a list of images in a folder that has 10 images. We resize images 4 at a time. Then we convert them to black and white two at a time, we can take the conversion as the GPU part of the process here. Here is what the code would look like:
def im_resize(im, num1, num2):
return im.resize((num1, num2), Image.ANTIALIAS)
def convert_bw(im):
return im.convert('L')
def read_images(path):
imlist = []
for pathAndFileName in glob.iglob(os.path.join(path, "*")):
if pathAndFileName.endswith(tuple([".jpg", ".JPG"])):
imlist.append(Image.open(pathAndFileName))
return imlist
img_list = read_images("path/to/images/")
final_img_list = []
for image in img_list:
# Resize needs to run concurrently on 4 processes so that the next img_tmp is always ready to go for convert
img_tmp = im_resize(image, 100, 100)
# Convert is limited, need to run on 2 processes
img_tmp = convert_bw(img_tmp)
final_img_list.append(img_tmp)
The reason for the specific number of processes and such is due to system performance metrics, this is what will reduce the runtime. I just want to make sure that the GPU doesn't have to be waiting for the CPU to finish processing images, and I want to have a constant queue filled with pre-processed images ready for the GPU to run. I would preferably want to keep a maximum size on the queue of about 4-10 pre-processed images. If you guys can help me illustrate how I would achieve this with this simplified example I'm sure I can figure out how to translate it into what I need for mine.
Thanks!
Here's a tentative attempt at implementing what you want:
...
# Mapping functions can only take one arg, we provide tuple
def img_resize_splat(a):
img_resize(*a)
if __name__=="__main__":
# Make a CPU pool and a GPU pool
cpu = Pool(4)
gpu = Pool(2)
# Hopefully this returns an iterable, and not a list with all images read into memory
img_list = read_images("path/to/images/")
# I'm assuming you want images to be processed as soon as ready, order doesn't matter
resized = cpu.imap_unordered(img_resize_splat, ((img, 100, 100) for img in img_list))
converted = gpu.imap_unordered(convert_bw, resized)
# This is an iterable with your results, slurp them up one at a time
for bw_img in converted:
# do something
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With