Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory usage in Numpy

I have a program that I've written on my Mac but won't run on my Raspberry Pi due to running out of RAM (MemoryError).

The essence of the program is some image processing, where it will convolve a 640x480 uint8 with a complex128 of twice the size.

I figure the memory usage is: Initial image:

640 x 480 x 8 bits / 8 bits / 1024 bytes = 300 kb

Complex matrix:

640 x 480 x 2^2 x 128 bits / 8 bits / 1024^2 = 18.75 MB

Let's assume it has to hold perhaps two or three copies of these various matrices in memory - that should be a fairly small footprint - perhaps < 100 MB. Unfortunately it seems to be exhausting the full 330MB available (the Python runtime must load into this space as well).

  1. Is my analysis correct?
  2. Any tips on how to manage memory a bit better in Python?

UPDATE:

As suggested below, I've done some memory profiling, it is indeed the fftconvolve the spikes the RAM usage, as follows:

Line # Mem usage Increment Line Contents

65   86.121 MiB    0.000 MiB     @profile
66                               def iriscode(self):
67   86.121 MiB    0.000 MiB       img = self.polar
68
69   86.379 MiB    0.258 MiB       pupil_curve = find_max(img[0:40])
70   86.379 MiB    0.000 MiB       blur = cv2.GaussianBlur(self.polar, (9, 9), 0)
71   76.137 MiB  -10.242 MiB       iris_fft = fit_iris_fft(radial_diff(blur[50:,:])) + 50
72
73   76.160 MiB    0.023 MiB       img = warp(img, iris_fft, pupil_curve)
74                                 # cv2.imshow("mask",np.uint8(ma.getmaskarray(img))*255)
75
76                                 global GABOR_FILTER
77  262.898 MiB  186.738 MiB       output = signal.fftconvolve(GABOR_FILTER, img, mode="valid")

Still, the magnitude of this increase surprises me. Any ideas what I can do to reduce it? I tried using complex64 instead of complex128 but the memory usage was the same.

like image 607
cjm2671 Avatar asked Jun 10 '14 15:06

cjm2671


1 Answers

To understand what's going on, you can look at the source code of fftconvolve here.

The whole idea behind Fourier transform convolution is that convolution in the time domain is simply elementwise multiplication in frequency domain. But because you are using FFT, it will treat your functions as if they were periodic, i.e. it is as if the convolution kernel wrapped around the edges. So to get proper results, the arrays are padded with zeros to a common shape which, in your case, will be (1280+640-1, 960+480-1) = (1919, 1439). To speed up calculations, this shape gets further expanded to the next larger number which has only 2, 3 or 5 as prime factors, which in your case leads to a (1920, 1440) shape. For a complex array, that takes up 1920 * 1440 * 16 / 2**20 = 42 MiB.

You are going to have 2 such arrays, one for each of your inputs, plus two more when you compute their respective FFTs, plus another one when you multiply them together, plus yet another one when you compute their inverse FFT to get your convolution.

It is not clear that all of these arrays will coexist simultaneously, as some may get garbage collected along the way, but there will at least be 3 of them at some point, probably 4. Add some overhead from the FFT calculations, and you have your 186 MiB explained.

You may want to try non-FFT convolution, which should not require all that padding. You could also try to slightly optimize the code in scipy.signal.fftconvolve. Replacing this else block with:

else:
    ret = fftn(in1, fshape)
    ret *= fftn(in2, fshape)
    ret = ifftn(ret)[fslice].copy()

should get rid of one of the intermediate copies, and give you 40 extra MiB, which may do the trick for your case.

like image 79
Jaime Avatar answered Sep 28 '22 22:09

Jaime