I was growing confused during the development of a small Python script involving matrix operations, so I fired up a shell to play around with a toy example and develop a better understanding of matrix indexing in Numpy.
This is what I did:
>>> import numpy as np
>>> A = np.matrix([1,2,3])
>>> A
matrix([[1, 2, 3]])
>>> A[0]
matrix([[1, 2, 3]])
>>> A[0][0]
matrix([[1, 2, 3]])
>>> A[0][0][0]
matrix([[1, 2, 3]])
>>> A[0][0][0][0]
matrix([[1, 2, 3]])
As you can imagine, this has not helped me develop a better understanding of matrix indexing in Numpy. This behavior would make sense for something that I would describe as "An array of itself", but I doubt anyone in their right mind would choose that as a model for matrices in a scientific library.
What is, then, the logic to the output I obtained? Why would the first element of a matrix object be itself?
PS: I know how to obtain the first entry of the matrix. What I am interested in is the logic behind this design decision.
EDIT: I'm not asking how to access a matrix element, or why a matrix row behaves like a matrix. I'm asking for a definition of the behavior of a matrix when indexed with a single number. It's an action typical of arrays, but the resulting behavior is nothing like the one you would expect from an array. I would like to know how this is implemented and what's the logic behind the design decision.
Array indexing is the same as accessing an array element. You can access an array element by referring to its index number. The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the second has index 1 etc.
Two-dimensional (2D) arrays are indexed by two subscripts, one for the row and one for the column. Each element in the 2D array must by the same type, either a primitive type or object type.
To construct a matrix in numpy we list the rows of the matrix in a list and pass that list to the numpy array constructor. The first slice selects all rows in A, while the second slice selects just the middle entry in each row. To do a matrix multiplication or a matrix-vector multiplication we use the np. dot() method.
Look at the shape after indexing:
In [295]: A=np.matrix([1,2,3])
In [296]: A.shape
Out[296]: (1, 3)
In [297]: A[0]
Out[297]: matrix([[1, 2, 3]])
In [298]: A[0].shape
Out[298]: (1, 3)
The key to this behavior is that np.matrix
is always 2d. So even if you select one row (A[0,:]
), the result is still 2d, shape (1,3)
. So you can string along as many [0]
as you like, and nothing new happens.
What are you trying to accomplish with A[0][0]
? The same as A[0,0]
?
For the base np.ndarray
class these are equivalent.
Note that Python
interpreter translates indexing to __getitem__
calls.
A.__getitem__(0).__getitem__(0)
A.__getitem__((0,0))
[0][0]
is 2 indexing operations, not one. So the effect of the second [0]
depends on what the first produces.
For an array A[0,0]
is equivalent to A[0,:][0]
. But for a matrix, you need to do:
In [299]: A[0,:][:,0]
Out[299]: matrix([[1]]) # still 2d
=============================
"An array of itself", but I doubt anyone in their right mind would choose that as a model for matrices in a scientific library.
What is, then, the logic to the output I obtained? Why would the first element of a matrix object be itself?
In addition, A[0,:] is not the same as A[0]
In light of these comments let me suggest some clarifications.
A[0]
does not mean 'return the 1st element'. It means select along the 1st axis. For a 1d array that means the 1st item. For a 2d array it means the 1st row. For ndarray
that would be a 1d array, but for a matrix
it is another matrix
. So for a 2d array or matrix, A[i,:]
is the same thing as A[i]
.
A[0]
does not just return itself. It returns a new matrix. Different id
:
In [303]: id(A)
Out[303]: 2994367932
In [304]: id(A[0])
Out[304]: 2994532108
It may have the same data, shape and strides, but it's a new object. It's just as unique as the ith
row of a many row matrix.
Most of the unique matrix
activity is defined in: numpy/matrixlib/defmatrix.py
. I was going to suggest looking at the matrix.__getitem__
method, but most of the action is performed in np.ndarray.__getitem__
.
np.matrix
class was added to numpy
as a convenience for old-school MATLAB programmers. numpy
arrays can have almost any number of dimensions, 0, 1, .... MATLAB allowed only 2, though a release around 2000 generalized it to 2 or more.
Imagine you have the following
>> A = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
If you want to get the second column value, use the following:
>> A.T[1]
array([ 2, 6, 10])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With