Optimising memory usage in numpy

Tags:

The following program loads two images with PyGame, converts them to Numpy arrays, and then performs some other Numpy operations (such as FFT) to emit a final result (of a few numbers). The inputs can be large, but at any moment only one or two large objects should be live.

A test image is about 10M pixels, which translates to 10MB once it's greyscaled. It gets converted to a Numpy array of dtype uint8, which after some processing (applying Hamming windows), is an array of dtype float64. Two images are loaded into arrays this way; later FFT steps result in an array of dtype complex128. Prior to adding the excessive gc.collect calls, the program memory size tended to increase with each step. Additionally, it seems most Numpy operations will give a result in the highest precision available.

Running the test (sans the gc.collect calls) on my 1GB Linux machine results in prolonged thrashing, which I have not waited for. I don't yet have detailed memory use stats -- I tried some Python modules and the time command to no avail; now I'm looking into valgrind. Watching PS (and dealing with machine unresponsiveness in the later stages of the test) suggests a maximum memory usage of about 800 MB.

A 10 million cell array of complex128 should occupy 160 MB. Having (ideally) at most two of these live at one time, plus the not-insubstantial Python and Numpy libraries and other paraphernalia, probably means allowing for 500 MB.

I can think of two angles from which to attack the problem:

Discarding intermediate arrays as soon as possible. That's what the gc.collect calls are for -- they seem to have improved the situation, as it now completes with only a few minutes of thrashing ;-). I think one can expect that memory-intensive programming in a language like Python will require some manual intervention.
Using less-precise Numpy arrays at each step. Unfortunately the operations that return arrays, like fft2, do not appear to allow the type to be specified.

So my main question is: is there a way of specifying output precision in Numpy array operations?

More generally, are there other common memory-conserving techniques when using Numpy?

Additionally, does Numpy have a more idiomatic way of freeing array memory? (I imagine this would leave the array object live in Python, but in an unusable state.) Explicit deletion followed by immediate GC feels hacky.

import sys
import numpy
import pygame
import gc


def get_image_data(filename):
    im = pygame.image.load(filename)
    im2 = im.convert(8)
    a = pygame.surfarray.array2d(im2)
    hw1 = numpy.hamming(a.shape[0])
    hw2 = numpy.hamming(a.shape[1])
    a = a.transpose()
    a = a*hw1
    a = a.transpose()
    a = a*hw2
    return a


def check():
    gc.collect()
    print 'check'


def main(args):
    pygame.init()

    pygame.sndarray.use_arraytype('numpy')

    filename1 = args[1]
    filename2 = args[2]
    im1 = get_image_data(filename1)
    im2 = get_image_data(filename2)
    check()
    out1 = numpy.fft.fft2(im1)
    del im1
    check()
    out2 = numpy.fft.fft2(im2)
    del im2
    check()
    out3 = out1.conjugate() * out2
    del out1, out2
    check()
    correl = numpy.fft.ifft2(out3)
    del out3
    check()
    maxs = correl.argmax()
    maxpt = maxs % correl.shape[0], maxs / correl.shape[0]
    print correl[maxpt], maxpt, (correl.shape[0] - maxpt[0], correl.shape[1] - maxpt[1])


if __name__ == '__main__':
    args = sys.argv
    exit(main(args))

783

asked Jun 29 '10 07:06

Edmund

1 Answers

This on SO says "Scipy 0.8 will have single precision support for almost all the fft code", and SciPy 0.8.0 beta 1 is just out.
(Haven't tried it myself, cowardly.)

127

answered Sep 21 '22 18:09

denis

Related questions
                            
                                Django Celery Beat with Database scheduler not running tasks
                            
                                What is the most conventional way to integrate C code into a Python library using distutils?
                            
                                CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment
                            
                                Error while loading fine-tuned simpletransformer model in Docker Container
                            
                                Reloading vs. restarting uWSGI to activate code changes
                            
                                How to correctly install PyICU on Heroku?
                            
                                Hermetic / Non Hermetic Packages in Python
                            
                                Python 3 requests how to force use a new connection for each request?
                            
                                guvectorize Not resolving types in nopython mode
                            
                                NumPy memory leak in np.ones?
                            
                                Data Synchronization framework / algorithm for server<->device?
                            
                                Can I view the doc string of a function in Python using VIM?
                            
                                Django serialization of inherited model
                            
                                Separate Admin/User authentication system in Django
                            
                                multiprocessing Pool hangs when there is a exception in any of the thread
                            
                                Is there an IntelliJ Python plugin for the community edition?
                            
                                Detect user logout / shutdown in Python / GTK under Linux - SIGTERM/HUP not received
                            
                                Pushing data once a URL is requested
                            
                                Opinions about Dabo
                            
                                The anatomy of a Python web project: development, packaging, deployment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimising memory usage in numpy

Tags:

python

memory-management

numpy

pygame

Edmund

People also ask

1 Answers

denis

Recent Activity

Donate For Us