I have a python app that imports 200k+ images, crops them, and presents the cropped image to pyzbar to interpret a barcode. Cropping helps because there are multiple barcodes on the image and, presumably pyzbar is a little faster when given smaller images.
Currently I am using Pillow to import and crop the image.
On the average importing and cropping an image takes 262 msecs and pyzbar take 8 msecs.
A typical run is about 21 hours.
I wonder if a library other than Pillow might offer substantial improvements in loading/cropping. Ideally the library should be available for MacOS but I could also run the whole thing in a virtual Ubuntu machine.
I am working on a version that can run in parallel processes which will be a big improvement but if I could get 25% or more speed increase from a different library I would also add that.
crop() method is used to crop a rectangular portion of any image. Parameters: box – a 4-tuple defining the left, upper, right, and lower pixel coordinate. Return type: Image (Returns a rectangular region as (left, upper, right, lower)-tuple).
You can resize multiple images in Python with the awesome PIL library and a small help of the os (operating system) library. By using os. listdir() function you can read all the file names in a directory. After that, all you have to do is to create a for loop to open, resize and save each image in the directory.
To crop an image to a certain area, use the PIL function Image. crop(left, upper, right, lower) that defines the area to be cropped using two points in the coordinate system: (left, upper) and (right, lower) pixel values. Those two points unambiguously define the rectangle to be cropped.
Use resize() to resize the whole image instead of cutting out a part of the image, and use putalpha() to create a transparent image by cutting out a shape other than a rectangle (such as a circle). Use slicing to crop the image represented by the NumPy array ndarray . Import Image from PIL and open the target image.
As you didn't provide a sample image, I made a dummy file with dimensions 2544x4200 at 1.1MB in size and it is provided at the end of the answer. I made 1,000 copies of that image and processed all 1,000 images for each benchmark.
As you only gave your code in the comments area, I took it, formatted it and made the best I could of it. I also put it in a loop so it can process many files for just one invocation of the Python interpreter - this becomes important when you have 20,000 files.
That looks like this:
#!/usr/bin/env python3
import sys
from PIL import Image
# Process all input files so we only incur Python startup overhead once
for filename in sys.argv[1:]:
print(f'Processing: {filename}')
imgc = Image.open(filename).crop((0, 150, 270, 1050))
My suspicion is that I can make that faster using:
Here is a pyvips
version of your code:
#!/usr/bin/env python3
import sys
import pyvips
import numpy as np
# Process all input files so we only incur Python startup overhead once
for filename in sys.argv[1:]:
print(f'Processing: {filename}')
img = pyvips.Image.new_from_file(filename, access='sequential')
roi = img.crop(0, 150, 270, 900)
mem_img = roi.write_to_memory()
# Make a numpy array from that buffer object
nparr = np.ndarray(buffer=mem_img, dtype=np.uint8,
shape=[roi.height, roi.width, roi.bands])
Here are the results:
./orig.py bc*jpg
224 seconds, i.e. 224 ms per image, same as you
parallel ./orig.py ::: bc*jpg
55 seconds
parallel -X ./orig.py ::: bc*jpg
42 seconds
./vipsversion bc*
30 seconds, i.e. 7x as fast as PIL which was 224 seconds
parallel ./vipsversion ::: bc*
32 seconds
parallel -X ./vipsversion ::: bc*
5.2 seconds, i.e. this is the way to go :-)
Note that you can install GNU Parallel on macOS with homebrew:
brew install parallel
You might take a look on PyTurboJPEG which is a Python wrapper of libjpeg-turbo with insanely fast rescaling (1/2, 1/4, 1/8) while decoding large JPEG image, the returning numpy.ndarray is handy for image cropping. Moreover, JPEG image encoding speed is also remarkable.
from turbojpeg import TurboJPEG
# specifying library path explicitly
# jpeg = TurboJPEG(r'D:\turbojpeg.dll')
# jpeg = TurboJPEG('/usr/lib64/libturbojpeg.so')
# jpeg = TurboJPEG('/usr/local/lib/libturbojpeg.dylib')
# using default library installation
jpeg = TurboJPEG()
# direct rescaling 1/2 while decoding input.jpg to BGR array
in_file = open('input.jpg', 'rb')
bgr_array_half = jpeg.decode(in_file.read(), scaling_factor=(1, 2))
in_file.close()
# encoding BGR array to output.jpg with default settings.
out_file = open('output.jpg', 'wb')
out_file.write(jpeg.encode(bgr_array))
out_file.close()
libjpeg-turbo prebuilt binaries for macOS and Linux are also available here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With