Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the preferred way to preallocate NumPy arrays?

Tags:

python

numpy

I am new to NumPy/SciPy. From the documentation, it seems more efficient to preallocate a single array rather than call append/insert/concatenate.

For example, to add a column of 1's to an array, i think that this:

ar0 = np.linspace(10, 20, 16).reshape(4, 4) ar0[:,-1] = np.ones_like(ar0[:,0]) 

is preferred to this:

ar0 = np.linspace(10, 20, 12).reshape(4, 3) ar0 = np.insert(ar0, ar0.shape[1], np.ones_like(ar0[:,0]), axis=1) 

my first question is whether this is correct (that the first is better), and my second question is, at the moment, I am just preallocating my arrays like this (which I noticed in several of the Cookbook examples on the SciPy Site):

np.zeros((8,5)) 

what is the 'NumPy-preferred' way to do this?

like image 793
kim busyn Avatar asked Aug 16 '10 09:08

kim busyn


People also ask

Why is NumPy preferred over list?

NumPy uses much less memory to store data The NumPy arrays takes significantly less amount of memory as compared to python lists. It also provides a mechanism of specifying the data types of the contents, which allows further optimisation of the code.

What is __ Array_interface __?

__array_interface__ A dictionary of items (3 required and 5 optional). The optional keys in the dictionary have implied defaults if they are not provided. The keys are: shape (required) Tuple whose elements are the array size in each dimension.


1 Answers

Preallocation mallocs all the memory you need in one call, while resizing the array (through calls to append,insert,concatenate or resize) may require copying the array to a larger block of memory. So you are correct, preallocation is preferred over (and should be faster than) resizing.

There are a number of "preferred" ways to preallocate numpy arrays depending on what you want to create. There is np.zeros, np.ones, np.empty, np.zeros_like, np.ones_like, and np.empty_like, and many others that create useful arrays such as np.linspace, and np.arange.

So

ar0 = np.linspace(10, 20, 16).reshape(4, 4) 

is just fine if this comes closest to the ar0 you desire.

However, to make the last column all 1's, I think the preferred way would be to just say

ar0[:,-1]=1 

Since the shape of ar0[:,-1] is (4,), the 1 is broadcasted to match this shape.

like image 173
unutbu Avatar answered Sep 22 '22 20:09

unutbu