Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leak in Python extension when array is created with PyArray_SimpleNewFromData() and returned

I wrote a simple Python extension module to simulate a 3-bit analog-to-digital converter. It is supposed to accept a floating-point array as its input to return the same size array of output. The output actually consists of quantized input numbers. Here is my (simplified) module:

static PyObject *adc3(PyObject *self, PyObject *args) {
  PyArrayObject *inArray = NULL, *outArray = NULL;
  double *pinp = NULL, *pout = NULL;
  npy_intp nelem;
  int dims[1], i, j;

  /* Get arguments:  */
  if (!PyArg_ParseTuple(args, "O:adc3", &inArray))
    return NULL;

  nelem = PyArray_DIM(inArray,0); /* size of the input array */
  pout = (double *) malloc(nelem*sizeof(double));
  pinp = (double *) PyArray_DATA(inArray);

  /*   ADC action   */
  for (i = 0; i < nelem; i++) {
    if (pinp[i] >= -0.5) {
    if      (pinp[i] < 0.5)   pout[i] = 0;
    else if (pinp[i] < 1.5)   pout[i] = 1;
    else if (pinp[i] < 2.5)   pout[i] = 2;
    else if (pinp[i] < 3.5)   pout[i] = 3;
    else                      pout[i] = 4;
    }
    else {
    if      (pinp[i] >= -1.5) pout[i] = -1;
    else if (pinp[i] >= -2.5) pout[i] = -2;
    else if (pinp[i] >= -3.5) pout[i] = -3;
    else                      pout[i] = -4;
    }
  }

  dims[0] = nelem;

  outArray = (PyArrayObject *)
               PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, pout);
  //Py_INCREF(outArray);

  return PyArray_Return(outArray); 
} 

/* ==== methods table ====================== */
static PyMethodDef mwa_methods[] = {
  {"adc", adc, METH_VARARGS, "n-bit Analog-to-Digital Converter (ADC)"},
  {NULL, NULL, 0, NULL}
};

/* ==== Initialize ====================== */
PyMODINIT_FUNC initmwa()  {
    Py_InitModule("mwa", mwa_methods);
    import_array();  // for NumPy
}

I expected that if reference counts were processed correctly, the Python garbage collection would (frequently enough) release the memory used by the output array if it has the same name and is used repeatedly. So I tested it on some dummy (but voluminous) data with this code:

for i in xrange(200): 
    a = rand(1000000)
    b = mwa.adc3(a)
    print i

Here the array named "b" is reused many times and its memory, borrowed by adc3() from the heap, is expected to be returned to the system. I used the gnome-system-monitor to check. Contrary to my expectations, the memory owned by python grew rapidly and could only be released by quitting the program (I use IPython). For comparison, I tried the same procedure with the standard NumPy functions, zeros() and copy():

for i in xrange(1000): 
    a = np.zeros(10000000)
    b = np.copy(a)
    print i

As you can see, the latter code does not make any memory build-up. I read many texts in the standard documentation and on the web, tried to use Py_INCREF(outArray) and not to use it. All in vain: the problem persisted.

However, I found the solution in http://wiki.scipy.org/Cookbook/C_Extensions/NumPy_arrays. The author provides an extension program matsq() that creates an array and returns it. When I tried to use the calls suggested by the author:

outArray = (PyArrayObject *) PyArray_FromDims(nd,dims,NPY_DOUBLE);
pout = (double *) outArray->data;

instead of my

pout = (double *) malloc(nelem*sizeof(double));
outArray = (PyArrayObject *)
            PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, pout);
/* no matter with or without Py_INCREF(outArray)) */

the memory leak gone! The program works properly now.

A question: can anybody explain why PyArray_SimpleNewFromData() does not provide the correct reference counting, while PyArray_FromDims() does?

Thank you very much.

ADDITION. I probably exceeded the room/time in comments, so I add to my comment to Alex here. I tried to set the OWNDATA flag this way:

outArray->flags |= OWNDATA;

but I got "error: ‘OWNDATA’ undeclared". The rest is in the comment. Thank you in advance.


SOLVED: The correct setting of the flag is

outArray->flags |= NPY_ARRAY_OWNDATA;

Now it works.

Alex, sorry.

like image 517
Benkevitch Avatar asked Jan 12 '15 23:01

Benkevitch


1 Answers

The problem is not with PyArray_SimpleNewFromData which produces a properly refcounted PyObject*. Rather, it's with your malloc, assigned to pout then never freed.

As the docs at http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html clearly state, documenting PyArray_SimpleNewFromData:

the ndarray will not own its data. When this ndarray is deallocated, the pointer will not be freed. ... If you want the memory to be freed as soon as the ndarray is deallocated then simply set the OWNDATA flag on the returned ndarray.

(my emphasis on the not). IOW, you're observing exactly the "will not be freed" behavior so clearly documented, and are not taking the step specifically recommended should you want to avoid said behavior.

like image 108
Alex Martelli Avatar answered Oct 27 '22 16:10

Alex Martelli