Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Growing matrices columnwise in NumPy

Tags:

In pure Python you can grow matrices column by column pretty easily:

data = []
for i in something:
    newColumn = getColumnDataAsList(i)
    data.append(newColumn)

NumPy's array doesn't have the append function. The hstack function doesn't work on zero sized arrays, thus the following won't work:

data = numpy.array([])
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    data = numpy.hstack((data, newColumn)) # ValueError: arrays must have same number of dimensions

So, my options are either to remove the initalization iside the loop with appropriate condition:

data = None
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    if data is None:
        data = newColumn
    else:
        data = numpy.hstack((data, newColumn)) # works

... or to use a Python list and convert is later to array:

data = []
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    data.append(newColumn)
data = numpy.array(data)

Both variants seem a little bit awkward to be. Are there nicer solutions?

like image 662
Boris Gorelik Avatar asked Nov 23 '09 13:11

Boris Gorelik


People also ask

How do I extend a matrix in NumPy?

Ways to extend NumPy array in PythonExtend element to Numpy Array append(). Extend NumPy array row-wise in Python. Extend NumPy array column-wise. Extend NumPy array with zeros.

How do you increase the size of a matrix in Python?

resize() With the help of Numpy matrix. resize() method, we are able to resize the shape of the given matrix. Remember all elements should be covered after resizing the given matrix.

How do NumPy arrays grow in size?

NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original. The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.


2 Answers

NumPy actually does have an append function, which it seems might do what you want, e.g.,

import numpy as NP
my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)
new_col = NP.array((5, 5, 5)).reshape(3, 1)
res = NP.append(my_data, new_col, axis=1)

your second snippet (hstack) will work if you add another line, e.g.,

my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
# the line to add--does not depend on array dimensions
new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
res = NP.hstack((my_data, new_col))

hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.


While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:

initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).

For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:

>>> # initialize your skeleton array using 'empty' for lowest-memory footprint 
>>> M = NP.empty(shape=(10, 5), dtype=float)

>>> # create a small function to mimic step-wise populating this empty 2D array:
>>> fnx = lambda v : NP.random.randint(0, 10, v)

populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets

>>> for index, itm in enumerate(range(5)):    
        M[:,index] = fnx(10)

>>> M
  array([[ 1.,  7.,  0.,  8.,  7.],
         [ 9.,  0.,  6.,  9.,  4.],
         [ 2.,  3.,  6.,  3.,  4.],
         [ 3.,  4.,  1.,  0.,  5.],
         [ 2.,  3.,  5.,  3.,  0.],
         [ 4.,  6.,  5.,  6.,  2.],
         [ 0.,  6.,  1.,  6.,  8.],
         [ 3.,  8.,  0.,  8.,  0.],
         [ 5.,  2.,  5.,  0.,  1.],
         [ 0.,  6.,  5.,  9.,  1.]])

of course if you don't known in advance what size your array should be just create one much bigger than you need and trim the 'unused' portions when you finish populating it

>>> M[:3,:3]
  array([[ 9.,  3.,  1.],
         [ 9.,  6.,  8.],
         [ 9.,  7.,  5.]])
like image 141
doug Avatar answered Sep 18 '22 06:09

doug


Usually you don't keep resizing a NumPy array when you create it. What don't you like about your third solution? If it's a very large matrix/array, then it might be worth allocating the array before you start assigning its values:

x = len(something)
y = getColumnDataAsNumpyArray.someLengthProperty

data = numpy.zeros( (x,y) )
for i in something:
   data[i] = getColumnDataAsNumpyArray(i)
like image 42
Paul Avatar answered Sep 21 '22 06:09

Paul