Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nested Python numpy arrays dimension confusion

Suppose I have a numpy array c constructed as follows:

a = np.zeros((2,4))
b = np.zeros((2,8))
c = np.array([a,b])

I would have expected c.shape to be (2,1) or (2,) but instead it is (2,2). Additionally, what I want to do is concatenate a column vector of ones onto a, but by accessing it through c in the following way:

c0 = c[0] # I would have expected this to be 'a'
np.concatenate((np.ones((c0.shape[0], 1)), c0), axis=1)

This of course doesn't work because c[0] does not equal a as I expected, and I get

ValueError: all the input arrays must have same number of dimensions

I need some way to have an array (or list) of pairs, each pair component being a numpy array, and I need to access the first array in the pair in order to concatenate a column vector of ones to it. My application is machine learning and my data will be coming to me in the format described, but I need to modify the data at the start in order to add a bias element to it.

EDIT: I'm using Python 2.7 and Numpy 1.8.2

like image 200
adamconkey Avatar asked Jul 18 '15 16:07

adamconkey


People also ask

Can NumPy arrays have more than 2 dimensions?

Creating arrays with more than one dimensionIn general numpy arrays can have more than one dimension. One way to create such array is to start with a 1-dimensional array and use the numpy reshape() function that rearranges elements of that array into a new shape.

Can NumPy array change size?

there is no converting the dimensions of a numpy array in python. A numpy array is simply a section of your RAM. You can't append to it in the sense of literally adding bytes to the end of the array, but you can create another array and copy over all the data (which is what np. append(), or np.

How do you find the dimension of a NumPy array?

You can get the number of dimensions, shape (length of each dimension), and size (number of all elements) of the NumPy array with ndim , shape , and size attributes of numpy. ndarray . The built-in function len() returns the size of the first dimension.


2 Answers

Generally, nested NumPy arrays of NumPy arrays are not very useful. If you are using NumPy for speed, usually it is best to stick with NumPy arrays with a homogenous, basic numeric dtype.

To place two items in a data structure such that c[0] returns the first item, and c[1] the second, a list (or tuple) such as c = [a, b] will do.


By the way, if you are using the statemodels package, then you can add a constant column with sm.add_constant:

import numpy as np
import statsmodels.api as sm

a = np.random.randint(10, size=(2,4))
print(a)
# [[2 3 9 6]
#  [0 2 1 1]]
print(sm.add_constant(a))
[[ 1.  2.  3.  9.  6.]
 [ 1.  0.  2.  1.  1.]]

Note however that if a already contains a constant column, no extra column is added:

In [126]: sm.add_constant(np.zeros((2,4)))
Out[126]: 
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
like image 145
unutbu Avatar answered Oct 23 '22 01:10

unutbu


I believe what you want to use is hstack:

a = np.zeros((2,4))  # 4 column vectors of length 2
b = np.ones((2,1))   # 1 column vector of length 2

c = np.hstack((a, b))
print c
# [[ 0.  0.  0.  0.  1.]
#  [ 0.  0.  0.  0.  1.]]

Regarding the problem concatenating your a and b: This cannot be done in a obvious way. Concatenation means stacking on top of each other in an additional dimension. Your data does not fit on one another though...

like image 42
Dux Avatar answered Oct 23 '22 01:10

Dux