Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leak by ctypes pointers used within python class

I try to wrap some C code via ctypes. Altough, my code (attached below) is functional, memory_profiler suggests it is suffering a memory leak somewhere. The basic C struct, I'm trying to wrap is defined in 'image.h'. It defines an image object, containing a pointer to the data, a pointer array (needed for various other functions not included here), along with some shape information.

image.h:

#include <stdio.h>
#include <stdlib.h>

typedef struct image {
double * data;    /*< The main pointer to the image data*/
i3_flt **row;     /*< An array of pointers to each row of the image*/
unsigned long n;  /*< The total number of pixels in the image*/
unsigned long nx; /*< The number of pixels per row (horizontal image dimensions)*/
unsigned long ny; /*< The number of pixels per column (vertical image dimensions)*/
} image;

The python code that wraps this C struct via ctypes is contained in 'image_wrapper.py' below. The python class Image implements many more methods which I didn't include here. The idea is to have a python object, that is as convenient to use as a numpy array. In fact, the class contains a numpy array as an attribute (self.array) which points to the exact same memory location than the data pointer within the C struct.

image_wrapper.py:

import numpy
import ctypes as c

class Image(object):

    def __init__(self, nx, ny):

        self.nx = nx
        self.ny = ny
        self.n = nx * ny
        self.shape = tuple((nx, ny))
        self.array = numpy.zeros((nx, ny), order='C', dtype=c.c_double)
        self._argtype = self._argtype_generator()
        self._update_cstruct_from_array()

    def _update_cstruct_from_array(self):

        data_pointer = self.array.ctypes.data_as(c.POINTER(c.c_double))

        ctypes_pointer = c.POINTER(c.c_double) * self.ny
        row_pointers = ctypes_pointer(
            *[self.array[i,:].ctypes.data_as(c.POINTER(c.c_double)) for i in range(self.ny)])

        ctypes_pointer = c.POINTER(ctypes_pointer)
        row_pointer = ctypes_pointer(row_pointers)

        self._cstruct = c.pointer(self._argtype(data=data_pointer,
                                                row=row_pointer,
                                                n=self.n,
                                                nx=self.nx,
                                                ny=self.ny))

    def _argtype_generator(self):

        class _Argtype(c.Structure):
            _fields_ = [("data", c.POINTER(c.c_double)),
                        ("row", c.POINTER(c.POINTER(c.c_double) * self.ny)),
                        ("n", c.c_ulong),
                        ("nx", c.c_ulong),
                        ("ny", c.c_ulong)]

        return _Argtype

Now, testing the memory consumption of the above code with memory_profiler suggests that Python's garbage collector is unable to clean up all references. Here is my test code, that creates a variable number of class instances within loops of different sizes.

test_image_wrapper.py

import sys
import image_wrapper as img
import numpy as np 

@profile
def main(argv):
    image_size = 500

    print 'Create 10 images\n'
    for i in range(10):
        x = img.Image(image_size, image_size)
        del x

    print 'Create 100 images\n'
    for i in range(100):
        x = img.Image(image_size, image_size)
        del x

    print 'Create 1000 images\n'
    for i in range(1000):
        x = img.Image(image_size, image_size)
        del x

    print 'Create 10000 images\n'
    for i in range(10000):
        x = img.Image(image_size, image_size)
        del x

if __name__ == "__main__":
    main(sys.argv)

The @profile is telling memory_profiler to analyse the subsequent function, here main. Running python with memory_profiler on test_image_wrapper.py via

python -m memory_profiler test_image_wrapper.py

yields the following output:

Filename: test_image_wrapper.py

Line #    Mem usage    Increment   Line Contents
================================================
    49                             @profile
    50                             def main(argv):
    51                                 """
    52                                 Script to test memory usage of image.py
    53    16.898 MB     0.000 MB       """
    54    16.898 MB     0.000 MB       image_size = 500
    55                             
    56    16.906 MB     0.008 MB       print 'Create 10 images\n'
    57    19.152 MB     2.246 MB       for i in range(10):
    58    19.152 MB     0.000 MB           x = img.Image(image_size, image_size)
    59    19.152 MB     0.000 MB           del x
    60                             
    61    19.152 MB     0.000 MB       print 'Create 100 images\n'
    62    19.512 MB     0.359 MB       for i in range(100):
    63    19.516 MB     0.004 MB           x = img.Image(image_size, image_size)
    64    19.516 MB     0.000 MB           del x
    65                             
    66    19.516 MB     0.000 MB       print 'Create 1000 images\n'
    67    25.324 MB     5.809 MB       for i in range(1000):
    68    25.328 MB     0.004 MB           x = img.Image(image_size, image_size)
    69    25.328 MB     0.000 MB           del x
    70                             
    71    25.328 MB     0.000 MB       print 'Create 10000 images\n'
    72    83.543 MB    58.215 MB       for i in range(10000):
    73    83.543 MB     0.000 MB           x = img.Image(image_size, image_size)
    74                                     del x

Each instance of the class Image within python seems to leave about 5-6kB, summing up to ~58MB when processing 10k images. For an individual object this seems not much, but as I have to run on ten millions, I do care. The line that seems to cause the leak is the following contained in image_wrapper.py.

        self._cstruct = c.pointer(self._argtype(data=data_pointer,
                                                row=row_pointer,
                                                n=self.n,
                                                nx=self.nx,
                                                ny=self.ny))

As mentioned above, it seems Python's garbage collector is unable to clean up all references. I did try to implement my own del function, something like

def __del__(self):
    del self._cstruct
    del self

Unfortunately, this doesn't seem to fix the issue. After spending a day of researching and trying several memory debuggers, my last resort seems stackoverflow. Many thanks for your valuable thoughts and suggestions.

like image 279
Michael Avatar asked Oct 21 '22 07:10

Michael


1 Answers

It may not be the only issue, but for sure the caching of each _Argtype: LP__Argtype pair in the dict _ctypes._pointer_type_cache is not insignificant. Memory usage should go down if you clear the cache.

The pointer and function type caches can be cleared with ctypes._reset_cache(). Bear in mind that clearing the cache can cause problems. For example:

from ctypes import *
import ctypes

c_double_p = POINTER(c_double)
c_double_pp = POINTER(c_double_p)

class Image(Structure): 
    _fields_ = [('row', c_double_pp)]

ctypes._reset_cache()
nc_double_p = POINTER(c_double)
nc_double_pp = POINTER(nc_double_p)

The old pointers still work with Image:

>>> img = Image((c_double_p * 10)()) 
>>> img = Image(c_double_pp(c_double_p(c_double())))

New pointers created after resetting the cache won't work:

>>> img = Image((nc_double_p * 10)())

TypeError: incompatible types, LP_c_double_Array_10 instance 
  instead of LP_LP_c_double instance

>>> img = Image(nc_double_pp(nc_double_p(c_double())))

TypeError: incompatible types, LP_LP_c_double instance 
  instead of LP_LP_c_double instance

If resetting the cache solves your problem, maybe that's good enough. But generally the pointer cache is both necessary and beneficial, so personally I'd look for another way. For example, as far as I can see there's no reason to customize _Argtype for each image. You could just define row as a double ** initialized to the array of pointers.

like image 144
Eryk Sun Avatar answered Oct 27 '22 19:10

Eryk Sun