I have a program that I've written on my Mac but won't run on my Raspberry Pi due to running out of RAM (MemoryError).
The essence of the program is some image processing, where it will convolve a 640x480 uint8 with a complex128 of twice the size.
I figure the memory usage is: Initial image:
640 x 480 x 8 bits / 8 bits / 1024 bytes = 300 kb
Complex matrix:
640 x 480 x 2^2 x 128 bits / 8 bits / 1024^2 = 18.75 MB
Let's assume it has to hold perhaps two or three copies of these various matrices in memory - that should be a fairly small footprint - perhaps < 100 MB. Unfortunately it seems to be exhausting the full 330MB available (the Python runtime must load into this space as well).
UPDATE:
As suggested below, I've done some memory profiling, it is indeed the fftconvolve the spikes the RAM usage, as follows:
65 86.121 MiB 0.000 MiB @profile
66 def iriscode(self):
67 86.121 MiB 0.000 MiB img = self.polar
68
69 86.379 MiB 0.258 MiB pupil_curve = find_max(img[0:40])
70 86.379 MiB 0.000 MiB blur = cv2.GaussianBlur(self.polar, (9, 9), 0)
71 76.137 MiB -10.242 MiB iris_fft = fit_iris_fft(radial_diff(blur[50:,:])) + 50
72
73 76.160 MiB 0.023 MiB img = warp(img, iris_fft, pupil_curve)
74 # cv2.imshow("mask",np.uint8(ma.getmaskarray(img))*255)
75
76 global GABOR_FILTER
77 262.898 MiB 186.738 MiB output = signal.fftconvolve(GABOR_FILTER, img, mode="valid")
Still, the magnitude of this increase surprises me. Any ideas what I can do to reduce it? I tried using complex64
instead of complex128
but the memory usage was the same.
To understand what's going on, you can look at the source code of fftconvolve
here.
The whole idea behind Fourier transform convolution is that convolution in the time domain is simply elementwise multiplication in frequency domain. But because you are using FFT, it will treat your functions as if they were periodic, i.e. it is as if the convolution kernel wrapped around the edges. So to get proper results, the arrays are padded with zeros to a common shape which, in your case, will be (1280+640-1, 960+480-1) = (1919, 1439)
. To speed up calculations, this shape gets further expanded to the next larger number which has only 2, 3 or 5 as prime factors, which in your case leads to a (1920, 1440)
shape. For a complex array, that takes up 1920 * 1440 * 16 / 2**20 = 42 MiB
.
You are going to have 2 such arrays, one for each of your inputs, plus two more when you compute their respective FFTs, plus another one when you multiply them together, plus yet another one when you compute their inverse FFT to get your convolution.
It is not clear that all of these arrays will coexist simultaneously, as some may get garbage collected along the way, but there will at least be 3 of them at some point, probably 4. Add some overhead from the FFT calculations, and you have your 186 MiB
explained.
You may want to try non-FFT convolution, which should not require all that padding. You could also try to slightly optimize the code in scipy.signal.fftconvolve
. Replacing this else
block with:
else:
ret = fftn(in1, fshape)
ret *= fftn(in2, fshape)
ret = ifftn(ret)[fslice].copy()
should get rid of one of the intermediate copies, and give you 40 extra MiB, which may do the trick for your case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With