This code below best illustrates my problem:
The output to the console (NB it takes ~8 minutes to run even the first test) shows the 512x512x512x16-bit array allocations consuming no more than expected (256MByte for each one), and looking at "top" the process generally remains sub-600MByte as expected.
However, while the vectorized version of the function is being called, the process expands to enormous size (over 7GByte!). Even the most obvious explanation I can think of to account for this - that vectorize is converting the inputs and outputs to float64 internally - could only account for a couple of gigabytes, even though the vectorized function returns an int16, and the returned array is certainly an int16. Is there some way to avoid this happening ? Am I using/understanding vectorize's otypes argument wrong ?
import numpy as np
import subprocess
def logmem():
subprocess.call('cat /proc/meminfo | grep MemFree',shell=True)
def fn(x):
return np.int16(x*x)
def test_plain(v):
print "Explicit looping:"
logmem()
r=np.zeros(v.shape,dtype=np.int16)
for z in xrange(v.shape[0]):
for y in xrange(v.shape[1]):
for x in xrange(v.shape[2]):
r[z,y,x]=fn(x)
print type(r[0,0,0])
logmem()
return r
vecfn=np.vectorize(fn,otypes=[np.int16])
def test_vectorize(v):
print "Vectorize:"
logmem()
r=vecfn(v)
print type(r[0,0,0])
logmem()
return r
logmem()
s=(512,512,512)
v=np.ones(s,dtype=np.int16)
logmem()
test_plain(v)
test_vectorize(v)
v=None
logmem()
I'm using whichever versions of Python/numpy are current on an amd64 Debian Squeeze system (Python 2.6.6, numpy 1.4.1).
Vectorized implementations (numpy) are much faster and more efficient as compared to for-loops. To really see HOW large the difference is, let's try some simple operations used in most machine learnign algorithms (especially deep learning).
A major reason why vectorization is faster than its for loop counterpart is due to the underlying implementation of Numpy operations. As many of you know (if you're familiar with Python), Python is a dynamically typed language.
The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.
Vectorization in Python, as implemented by NumPy, can give you faster operations by using fast, low-level code to operate on bulk data. And Pandas builds on NumPy to provide similarly fast functionality.
It is a basic problem of vectorisation that all intermediate values are also vectors. While this is a convenient way to get a decent speed enhancement, it can be very inefficient with memory usage, and will be constantly thrashing your CPU cache. To overcome this problem, you need to use an approach which has explicit loops running at compiled speed, not at python speed. The best ways to do this are to use cython, fortran code wrapped with f2py or numexpr. You can find a comparison of these approaches here, although this focuses more on speed than memory usage.
you can read the source code of vectorize(). It convert the array's dtype to object, and call np.frompyfunc() to create the ufunc from your python function, the ufunc returns object array, and finally vectorize() convert object array to int16 array.
It will use many memory when the dtype of array is object.
Using python function to do element wise calculation is slow, even is's converted to ufunc by frompyfunc().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With