I'm trying to find the fastest approach to read a bunch of images from a directory into a numpy array. My end goal is to compute statistics such as the max, min, and nth percentile of the pixels from all these images. This is straightforward and fast when the pixels from all the images are in one big numpy array, since I can use the inbuilt array methods such as <code>.max</code> and <code>.min</code>, and the <code>np.percentile</code> function. Below are a few example timings with 25 tiff-images (512x512 pixels). These benchmarks are from using <code>%%timit</code> in a jupyter-notebook. The differences are too small to have any practical implications for just 25 images, but I am intending to read thousands of images in the future. <pre class="prettyprint"><code># Imports import os import skimage.io as io import numpy as np </code></pre> <ol> <li> Appending to a list <pre class="prettyprint"><code>%%timeit imgs = [] img_path = '/path/to/imgs/' for img in os.listdir(img_path): imgs.append(io.imread(os.path.join(img_path, img))) ## 32.2 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) </code></pre> </li> <li> Using a dictionary <pre class="prettyprint"><code>%%timeit imgs = {} img_path = '/path/to/imgs/' for img in os.listdir(img_path): imgs[num] = io.imread(os.path.join(img_path, img)) ## 33.3 ms ± 402 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) </code></pre> </li> </ol> For the list and dictionary approaches above, I tried replacing the loop with a the respective comprehension with similar results time-wise. I also tried preallocating the dictionary keys with no significant difference in the time taken. To get the images from a list to a big array, I would use <code>np.concatenate(imgs)</code>, which only takes ~1 ms. <ol start="3"> <li> Preallocating a numpy array along the first dimension <pre class="prettyprint"><code>%%timeit imgs = np.ndarray((512*25,512), dtype='uint16') img_path = '/path/to/imgs/' for num, img in enumerate(os.listdir(img_path)): imgs[num*512:(num+1)*512, :] = io.imread(os.path.join(img_path, img)) ## 33.5 ms ± 804 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) </code></pre> </li> <li> Preallocating a numpy along the third dimension <pre class="prettyprint"><code>%%timeit imgs = np.ndarray((512,512,25), dtype='uint16') img_path = '/path/to/imgs/' for num, img in enumerate(os.listdir(img_path)): imgs[:, :, num] = io.imread(os.path.join(img_path, img)) ## 71.2 ms ± 2.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) </code></pre> </li> </ol> I initially thought the numpy preallocation approaches would be faster, since there is no dynamic variable expansion in the loop, but this does not seem to be the case. The approach that I find the most intuitive is the last one, where each image occupies a separate dimensions along the third axis of the array, but this is also the slowest. The additional time taken is not due to the preallocation itself, which only takes ~ 1 ms. I have three question regarding this: <ol> <li>Why is the numpy preallocation approaches not faster than the dictionary and list solutions?</li> <li>Which is the fastest way to read in thousands of images into one big numpy array?</li> <li>Could I benefit from looking outside numpy and scikit-image, for an even faster module for reading in images? I tried <code>plt.imread()</code>, but the <code>scikit-image.io</code> module is faster.</li> </ol>

Part A : Accessing and assigning NumPy arrays Going by the way elements are stored in row-major order for NumPy arrays, you are doing the right thing when storing those elements along the last axis per iteration. These would occupy contiguous memory locations and as such would be the most efficient for accessing and assigning values into. Thus initializations like <code>np.ndarray((512*25,512), dtype='uint16')</code> or <code>np.ndarray((25,512,512), dtype='uint16')</code> would work the best as also mentioned in the comments. After compiling those as funcs for testing on timings and feeding in random arrays instead of images - <pre class="prettyprint"><code>N = 512 n = 25 a = np.random.randint(0,255,(N,N)) def app1(): imgs = np.empty((N,N,n), dtype='uint16') for i in range(n): imgs[:,:,i] = a # Storing along the first two axes return imgs def app2(): imgs = np.empty((N*n,N), dtype='uint16') for num in range(n): imgs[num*N:(num+1)*N, :] = a # Storing along the last axis return imgs def app3(): imgs = np.empty((n,N,N), dtype='uint16') for num in range(n): imgs[num,:,:] = a # Storing along the last two axes return imgs def app4(): imgs = np.empty((N,n,N), dtype='uint16') for num in range(n): imgs[:,num,:] = a # Storing along the first and last axes return imgs </code></pre> Timings - <pre class="prettyprint"><code>In [45]: %timeit app1() ...: %timeit app2() ...: %timeit app3() ...: %timeit app4() ...: 10 loops, best of 3: 28.2 ms per loop 100 loops, best of 3: 2.04 ms per loop 100 loops, best of 3: 2.02 ms per loop 100 loops, best of 3: 2.36 ms per loop </code></pre> Those timings confirm the performance theory proposed at the start, though I expected the timings for the last setup to have timings in between the ones for <code>app3</code> and <code>app1</code>, but maybe the effect of going from last to the first axis for accessing and assigning isn't linear. More investigations on this one could be interesting (follow up question here). To claify schematically, consider that we are storing image arrays, denoted by <code>x</code> (image 1) and <code>o</code> (image 2), we would have : App1 : <pre class="prettyprint"><code>[[[x 0] [x 0] [x 0] [x 0] [x 0]] [[x 0] [x 0] [x 0] [x 0] [x 0]] [[x 0] [x 0] [x 0] [x 0] [x 0]]] </code></pre> Thus, in memory space, it would be : <code>[x,o,x,o,x,o..]</code> following row-major order. App2 : <pre class="prettyprint"><code>[[x x x x x] [x x x x x] [x x x x x] [o o o o o] [o o o o o] [o o o o o]] </code></pre> Thus, in memory space, it would be : <code>[x,x,x,x,x,x...o,o,o,o,o..]</code>. App3 : <pre class="prettyprint"><code>[[[x x x x x] [x x x x x] [x x x x x]] [[o o o o o] [o o o o o] [o o o o o]]] </code></pre> Thus, in memory space, it would be same as previous one. <hr> Part B : Reading image from disk as arrays Now, the part on reading image, I have seen OpenCV's <code>imread</code> to be much faster. As a test, I downloaded Mona Lisa's image from wiki page and tested performance on image reading - <pre class="prettyprint"><code>import cv2 # OpenCV In [521]: %timeit io.imread('monalisa.jpg') 100 loops, best of 3: 3.24 ms per loop In [522]: %timeit cv2.imread('monalisa.jpg') 100 loops, best of 3: 2.54 ms per loop </code></pre>

Fastest approach to read thousands of images into one big numpy array

Tags:

I'm trying to find the fastest approach to read a bunch of images from a directory into a numpy array. My end goal is to compute statistics such as the max, min, and nth percentile of the pixels from all these images. This is straightforward and fast when the pixels from all the images are in one big numpy array, since I can use the inbuilt array methods such as .max and .min, and the np.percentile function.

Below are a few example timings with 25 tiff-images (512x512 pixels). These benchmarks are from using %%timit in a jupyter-notebook. The differences are too small to have any practical implications for just 25 images, but I am intending to read thousands of images in the future.

# Imports import os import skimage.io as io import numpy as np

Appending to a list

%%timeit imgs = []     img_path = '/path/to/imgs/' for img in os.listdir(img_path):         imgs.append(io.imread(os.path.join(img_path, img)))     ## 32.2 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using a dictionary

%%timeit     imgs = {}     img_path = '/path/to/imgs/'     for img in os.listdir(img_path):         imgs[num] = io.imread(os.path.join(img_path, img))     ## 33.3 ms ± 402 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

For the list and dictionary approaches above, I tried replacing the loop with a the respective comprehension with similar results time-wise. I also tried preallocating the dictionary keys with no significant difference in the time taken. To get the images from a list to a big array, I would use np.concatenate(imgs), which only takes ~1 ms.

Preallocating a numpy array along the first dimension

%%timeit     imgs = np.ndarray((512*25,512), dtype='uint16')     img_path = '/path/to/imgs/'     for num, img in enumerate(os.listdir(img_path)):         imgs[num*512:(num+1)*512, :] = io.imread(os.path.join(img_path, img))     ## 33.5 ms ± 804 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Preallocating a numpy along the third dimension

%%timeit     imgs = np.ndarray((512,512,25), dtype='uint16')     img_path = '/path/to/imgs/'     for num, img in enumerate(os.listdir(img_path)):         imgs[:, :, num] = io.imread(os.path.join(img_path, img))     ## 71.2 ms ± 2.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

I initially thought the numpy preallocation approaches would be faster, since there is no dynamic variable expansion in the loop, but this does not seem to be the case. The approach that I find the most intuitive is the last one, where each image occupies a separate dimensions along the third axis of the array, but this is also the slowest. The additional time taken is not due to the preallocation itself, which only takes ~ 1 ms.

I have three question regarding this:

Why is the numpy preallocation approaches not faster than the dictionary and list solutions?
Which is the fastest way to read in thousands of images into one big numpy array?
Could I benefit from looking outside numpy and scikit-image, for an even faster module for reading in images? I tried plt.imread(), but the scikit-image.io module is faster.

560

asked May 19 '17 20:05

joelostblom

2 Answers

Part A : Accessing and assigning NumPy arrays

Going by the way elements are stored in row-major order for NumPy arrays, you are doing the right thing when storing those elements along the last axis per iteration. These would occupy contiguous memory locations and as such would be the most efficient for accessing and assigning values into. Thus initializations like np.ndarray((512*25,512), dtype='uint16') or np.ndarray((25,512,512), dtype='uint16') would work the best as also mentioned in the comments.

After compiling those as funcs for testing on timings and feeding in random arrays instead of images -

N = 512 n = 25 a = np.random.randint(0,255,(N,N))  def app1():     imgs = np.empty((N,N,n), dtype='uint16')     for i in range(n):         imgs[:,:,i] = a         # Storing along the first two axes     return imgs  def app2():     imgs = np.empty((N*n,N), dtype='uint16')     for num in range(n):             imgs[num*N:(num+1)*N, :] = a         # Storing along the last axis     return imgs  def app3():     imgs = np.empty((n,N,N), dtype='uint16')     for num in range(n):             imgs[num,:,:] = a         # Storing along the last two axes     return imgs  def app4():     imgs = np.empty((N,n,N), dtype='uint16')     for num in range(n):             imgs[:,num,:] = a         # Storing along the first and last axes     return imgs

Timings -

In [45]: %timeit app1()     ...: %timeit app2()     ...: %timeit app3()     ...: %timeit app4()     ...:  10 loops, best of 3: 28.2 ms per loop 100 loops, best of 3: 2.04 ms per loop 100 loops, best of 3: 2.02 ms per loop 100 loops, best of 3: 2.36 ms per loop

Those timings confirm the performance theory proposed at the start, though I expected the timings for the last setup to have timings in between the ones for app3 and app1, but maybe the effect of going from last to the first axis for accessing and assigning isn't linear. More investigations on this one could be interesting (follow up question here).

To claify schematically, consider that we are storing image arrays, denoted by x (image 1) and o (image 2), we would have :

App1 :

[[[x 0]   [x 0]   [x 0]   [x 0]   [x 0]]   [[x 0]   [x 0]   [x 0]   [x 0]   [x 0]]   [[x 0]   [x 0]   [x 0]   [x 0]   [x 0]]]

Thus, in memory space, it would be : [x,o,x,o,x,o..] following row-major order.

App2 :

[[x x x x x]  [x x x x x]  [x x x x x]  [o o o o o]  [o o o o o]  [o o o o o]]

Thus, in memory space, it would be : [x,x,x,x,x,x...o,o,o,o,o..].

App3 :

[[[x x x x x]   [x x x x x]   [x x x x x]]   [[o o o o o]   [o o o o o]   [o o o o o]]]

Thus, in memory space, it would be same as previous one.

Part B : Reading image from disk as arrays

Now, the part on reading image, I have seen OpenCV's imread to be much faster.

As a test, I downloaded Mona Lisa's image from wiki page and tested performance on image reading -

import cv2 # OpenCV  In [521]: %timeit io.imread('monalisa.jpg') 100 loops, best of 3: 3.24 ms per loop  In [522]: %timeit cv2.imread('monalisa.jpg') 100 loops, best of 3: 2.54 ms per loop

answered Sep 17 '22 10:09

Divakar

In this case, most of the time will be spent reading the files from disk, and I wouldn't worry too much about the time to populate a list.

In any case, here is a script comparing four method, without the overhead of reading an actual image from disk, but just read an object from memory.

import numpy as np import time from functools import wraps   x, y = 512, 512 img = np.random.randn(x, y) n = 1000   def timethis(func):     @wraps(func)     def wrapper(*args, **kwargs):         start = time.perf_counter()         r = func(*args, **kwargs)         end = time.perf_counter()         print('{}.{} : {} milliseconds'.format(func.__module__, func.__name__, (end - start)*1e3))         return r     return wrapper   @timethis def static_list(n):     imgs = [None]*n     for i in range(n):         imgs[i] = img     return imgs   @timethis def dynamic_list(n):     imgs = []     for i in range(n):         imgs.append(img)     return imgs   @timethis def list_comprehension(n):     return [img for i in range(n)]   @timethis def numpy_flat(n):     imgs = np.ndarray((x*n, y))     for i in range(n):         imgs[x*i:(i+1)*x, :] = img  static_list(n) dynamic_list(n) list_comprehension(n) numpy_flat(n)

The results show:

__main__.static_list : 0.07004200006122119 milliseconds __main__.dynamic_list : 0.10294799994881032 milliseconds __main__.list_comprehension : 0.05021800006943522 milliseconds __main__.numpy_flat : 309.80870099983804 milliseconds

Obviously your best bet is list comprehension, however even with populating a numpy array, its just 310 ms for reading 1000 images (from memory). So again, the overhead will be the disk read.

Why numpy is slower?

It is the way numpy stores array in memory. If we modify the python list functions to convert the list to a numpy array, the times are similar.

The modified functions return values:

@timethis def static_list(n):     imgs = [None]*n     for i in range(n):         imgs[i] = img     return np.array(imgs)   @timethis def dynamic_list(n):     imgs = []     for i in range(n):         imgs.append(img)     return np.array(imgs)   @timethis def list_comprehension(n):     return np.array([img for i in range(n)])

and the timing results:

__main__.static_list : 303.32892100022946 milliseconds __main__.dynamic_list : 301.86925499992867 milliseconds __main__.list_comprehension : 300.76925699995627 milliseconds __main__.numpy_flat : 305.9309459999895 milliseconds

So it is just a numpy thing that it takes more time, and it is constant value relative to array size...

answered Sep 17 '22 10:09

Gerges

Related questions
                            
                                Setting a max-width for flex items?
                            
                                How to uninstall package in Anaconda installed with pip
                            
                                A better way to avoid public member invisibility and source code bloat/repetition with inherited class templates?
                            
                                Save complete web page (incl css, images) using python/selenium
                            
                                What is the best way to share MasterPages across projects
                            
                                Aren't Information Expert & Tell Don't Ask at odds with Single Responsibility Principle?
                            
                                Convert a GIF into a CUR file [closed]
                            
                                mod_rewrite GUI?
                            
                                ASP.NET MVC & SQL Server Reporting Services
                            
                                Method overloads which differ only by generic constraint
                            
                                Valid use of accessors in init and dealloc methods?
                            
                                pytz: Why is normalize needed when converting between timezones?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With