I am writing a python module that needs to calculate the mean and standard deviation of pixel values across 1000+ arrays (identical dimensions).
I am looking for the fastest way to do this.
Currently I am looping through the arrays and using numpy.dstack to stack the 1000 arrays into a rather large 3d array...and then will calculate the mean across the 3rd(?) dimension. Each array has shape (5000,4000).
This approach is taking quite a long time!
Would anyone be able to advise on a more efficient method of approaching this problem?
Maybe you could calculate mean
and std
in a cumulative way something like this (untested):
im_size = (5000,4000)
cum_sum = np.zeros(im_size)
cum_sum_of_squares = np.zeros(im_size)
n = 0
for filename in filenames:
image = read_your_image(filename)
cum_sum += image
cum_sum_of_squares += image**2
n += 1
mean_image = cum_sum / n
std_image = np.sqrt(cum_sum_of_squares / n - (mean_image)**2)
This is probably limited by how fast you can read images from disk. It is not limited by memory, since you only have one image in memory at a time. The calculation of std
in this way might suffer from numerical problems, since you might be subtracting two large numbers. If that is a problem you have to loop over the files twice, first to calculate the mean and then accumulate (image - mean_image)**2
in the second pass.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With