In pure Python you can grow matrices column by column pretty easily: <pre class="prettyprint"><code>data = [] for i in something: newColumn = getColumnDataAsList(i) data.append(newColumn) </code></pre> NumPy's array doesn't have the append function. The <code>hstack</code> function doesn't work on zero sized arrays, thus the following won't work: <pre class="prettyprint"><code>data = numpy.array([]) for i in something: newColumn = getColumnDataAsNumpyArray(i) data = numpy.hstack((data, newColumn)) # ValueError: arrays must have same number of dimensions </code></pre> So, my options are either to remove the initalization iside the loop with appropriate condition: <pre class="prettyprint"><code>data = None for i in something: newColumn = getColumnDataAsNumpyArray(i) if data is None: data = newColumn else: data = numpy.hstack((data, newColumn)) # works </code></pre> ... or to use a Python list and convert is later to array: <pre class="prettyprint"><code>data = [] for i in something: newColumn = getColumnDataAsNumpyArray(i) data.append(newColumn) data = numpy.array(data) </code></pre> Both variants seem a little bit awkward to be. Are there nicer solutions?

NumPy actually does have an append function, which it seems might do what you want, e.g., <pre class="prettyprint"><code>import numpy as NP my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3) new_col = NP.array((5, 5, 5)).reshape(3, 1) res = NP.append(my_data, new_col, axis=1) </code></pre> your second snippet (hstack) will work if you add another line, e.g., <pre class="prettyprint"><code>my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4) # the line to add--does not depend on array dimensions new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1) res = NP.hstack((my_data, new_col)) </code></pre> <code>hstack</code> gives the same result as <code>concatenate((my_data, new_col), axis=1)</code>, i'm not sure how they compare performance-wise. <hr> While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why: initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size). For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain: <pre class="prettyprint"><code>>>> # initialize your skeleton array using 'empty' for lowest-memory footprint >>> M = NP.empty(shape=(10, 5), dtype=float) >>> # create a small function to mimic step-wise populating this empty 2D array: >>> fnx = lambda v : NP.random.randint(0, 10, v) </code></pre> populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets <pre class="prettyprint"><code>>>> for index, itm in enumerate(range(5)): M[:,index] = fnx(10) >>> M array([[ 1., 7., 0., 8., 7.], [ 9., 0., 6., 9., 4.], [ 2., 3., 6., 3., 4.], [ 3., 4., 1., 0., 5.], [ 2., 3., 5., 3., 0.], [ 4., 6., 5., 6., 2.], [ 0., 6., 1., 6., 8.], [ 3., 8., 0., 8., 0.], [ 5., 2., 5., 0., 1.], [ 0., 6., 5., 9., 1.]]) </code></pre> of course if you don't known in advance what size your array should be just create one much bigger than you need and trim the 'unused' portions when you finish populating it <pre class="prettyprint"><code>>>> M[:3,:3] array([[ 9., 3., 1.], [ 9., 6., 8.], [ 9., 7., 5.]]) </code></pre>

Usually you don't keep resizing a NumPy array when you create it. What don't you like about your third solution? If it's a very large matrix/array, then it might be worth allocating the array before you start assigning its values: <pre class="prettyprint"><code>x = len(something) y = getColumnDataAsNumpyArray.someLengthProperty data = numpy.zeros( (x,y) ) for i in something: data[i] = getColumnDataAsNumpyArray(i) </code></pre>

Growing matrices columnwise in NumPy

Tags:

In pure Python you can grow matrices column by column pretty easily:

data = []
for i in something:
    newColumn = getColumnDataAsList(i)
    data.append(newColumn)

NumPy's array doesn't have the append function. The hstack function doesn't work on zero sized arrays, thus the following won't work:

data = numpy.array([])
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    data = numpy.hstack((data, newColumn)) # ValueError: arrays must have same number of dimensions

So, my options are either to remove the initalization iside the loop with appropriate condition:

data = None
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    if data is None:
        data = newColumn
    else:
        data = numpy.hstack((data, newColumn)) # works

... or to use a Python list and convert is later to array:

data = []
for i in something:
    newColumn = getColumnDataAsNumpyArray(i)
    data.append(newColumn)
data = numpy.array(data)

Both variants seem a little bit awkward to be. Are there nicer solutions?

662

asked Nov 23 '09 13:11

Boris Gorelik

2 Answers

NumPy actually does have an append function, which it seems might do what you want, e.g.,

import numpy as NP
my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)
new_col = NP.array((5, 5, 5)).reshape(3, 1)
res = NP.append(my_data, new_col, axis=1)

your second snippet (hstack) will work if you add another line, e.g.,

my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
# the line to add--does not depend on array dimensions
new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
res = NP.hstack((my_data, new_col))

hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.

While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:

initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).

For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:

>>> # initialize your skeleton array using 'empty' for lowest-memory footprint 
>>> M = NP.empty(shape=(10, 5), dtype=float)

>>> # create a small function to mimic step-wise populating this empty 2D array:
>>> fnx = lambda v : NP.random.randint(0, 10, v)

populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets

>>> for index, itm in enumerate(range(5)):    
        M[:,index] = fnx(10)

>>> M
  array([[ 1.,  7.,  0.,  8.,  7.],
         [ 9.,  0.,  6.,  9.,  4.],
         [ 2.,  3.,  6.,  3.,  4.],
         [ 3.,  4.,  1.,  0.,  5.],
         [ 2.,  3.,  5.,  3.,  0.],
         [ 4.,  6.,  5.,  6.,  2.],
         [ 0.,  6.,  1.,  6.,  8.],
         [ 3.,  8.,  0.,  8.,  0.],
         [ 5.,  2.,  5.,  0.,  1.],
         [ 0.,  6.,  5.,  9.,  1.]])

of course if you don't known in advance what size your array should be just create one much bigger than you need and trim the 'unused' portions when you finish populating it

>>> M[:3,:3]
  array([[ 9.,  3.,  1.],
         [ 9.,  6.,  8.],
         [ 9.,  7.,  5.]])

141

answered Sep 18 '22 06:09

doug

Usually you don't keep resizing a NumPy array when you create it. What don't you like about your third solution? If it's a very large matrix/array, then it might be worth allocating the array before you start assigning its values:

x = len(something)
y = getColumnDataAsNumpyArray.someLengthProperty

data = numpy.zeros( (x,y) )
for i in something:
   data[i] = getColumnDataAsNumpyArray(i)

answered Sep 21 '22 06:09

Paul

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Growing matrices columnwise in NumPy

Tags:

Boris Gorelik

People also ask

2 Answers

doug

Paul

Recent Activity

Donate For Us

Growing matrices columnwise in NumPy

Tags:

Boris Gorelik

People also ask

2 Answers

doug

Paul

Related questions

Recent Activity

Donate For Us