Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid enormous additional memory consumption when using numpy vectorize?

This code below best illustrates my problem:

The output to the console (NB it takes ~8 minutes to run even the first test) shows the 512x512x512x16-bit array allocations consuming no more than expected (256MByte for each one), and looking at "top" the process generally remains sub-600MByte as expected.

However, while the vectorized version of the function is being called, the process expands to enormous size (over 7GByte!). Even the most obvious explanation I can think of to account for this - that vectorize is converting the inputs and outputs to float64 internally - could only account for a couple of gigabytes, even though the vectorized function returns an int16, and the returned array is certainly an int16. Is there some way to avoid this happening ? Am I using/understanding vectorize's otypes argument wrong ?

import numpy as np
import subprocess

def logmem():
    subprocess.call('cat /proc/meminfo | grep MemFree',shell=True)

def fn(x):
    return np.int16(x*x)

def test_plain(v):
    print "Explicit looping:"
    logmem()
    r=np.zeros(v.shape,dtype=np.int16)
    for z in xrange(v.shape[0]):
        for y in xrange(v.shape[1]):
            for x in xrange(v.shape[2]):
                r[z,y,x]=fn(x)
    print type(r[0,0,0])
    logmem()
    return r

vecfn=np.vectorize(fn,otypes=[np.int16])

def test_vectorize(v):
    print "Vectorize:"
    logmem()
    r=vecfn(v)
    print type(r[0,0,0])
    logmem()
    return r

logmem()    
s=(512,512,512)
v=np.ones(s,dtype=np.int16)
logmem()
test_plain(v)
test_vectorize(v)
v=None
logmem()

I'm using whichever versions of Python/numpy are current on an amd64 Debian Squeeze system (Python 2.6.6, numpy 1.4.1).

like image 853
timday Avatar asked Aug 16 '11 12:08

timday


People also ask

Is NumPy vectorize faster than for loop?

Vectorized implementations (numpy) are much faster and more efficient as compared to for-loops. To really see HOW large the difference is, let's try some simple operations used in most machine learnign algorithms (especially deep learning).

Why is NumPy vectorize fast?

A major reason why vectorization is faster than its for loop counterpart is due to the underlying implementation of Numpy operations. As many of you know (if you're familiar with Python), Python is a dynamically typed language.

What advantage does vectorization with NumPy arrays provide?

The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.

Do we want to vectorize in python Why or why not?

Vectorization in Python, as implemented by NumPy, can give you faster operations by using fast, low-level code to operate on bulk data. And Pandas builds on NumPy to provide similarly fast functionality.


2 Answers

It is a basic problem of vectorisation that all intermediate values are also vectors. While this is a convenient way to get a decent speed enhancement, it can be very inefficient with memory usage, and will be constantly thrashing your CPU cache. To overcome this problem, you need to use an approach which has explicit loops running at compiled speed, not at python speed. The best ways to do this are to use cython, fortran code wrapped with f2py or numexpr. You can find a comparison of these approaches here, although this focuses more on speed than memory usage.

like image 86
DaveP Avatar answered Oct 23 '22 05:10

DaveP


you can read the source code of vectorize(). It convert the array's dtype to object, and call np.frompyfunc() to create the ufunc from your python function, the ufunc returns object array, and finally vectorize() convert object array to int16 array.

It will use many memory when the dtype of array is object.

Using python function to do element wise calculation is slow, even is's converted to ufunc by frompyfunc().

like image 23
HYRY Avatar answered Oct 23 '22 03:10

HYRY