Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Numpy has dimension (n,) instead of (n,1) only [duplicate]

I have been curious about this for some time. I can live with that, but it always bites me when enough care is not taken, so I decide to post it here. Suppose the following example (Numpy version = 1.8.2):

a = array([[0, 1], [2, 3]])
print shape(a[0:0, :]) # (0, 2)
print shape(a[0:1, :]) # (1, 2)
print shape(a[0:2, :]) # (2, 2)
print shape(a[0:100, :]) # (2, 2)

print shape(a[0]) # (2, )
print shape(a[0, :]) # (2, )
print shape(a[:, 0]) # (2, )

I don't know how other people feel, but the result feels inconsistent to me. The last line is a column vector while the second to last line is a row vector, they should have different dimension -- in linear algebra they do! (Line 5 is another surprise, but I will neglect it for now). Consider a second example:

solution = scipy.sparse.linalg.dsolve.linsolve.spsolve(A, b) # solution of dimension (n, )
analytic = reshape(f(x, y), (n, 1)) # analytic of dimension (n, 1)
error = solution - analytic

Now error is of dimension (n, n). Yes, in the second line I should use (n, ) instead of (n, 1), but why? I used to use MATLAB a lot, where one-d vector has dimension (n, 1), linspace/arange returns array of dimension (n, 1), and there never exists (n, ). But in Numpy (n, 1) and (n, ) coexist, and there are many functions for dimension handling alone: atleast, newaxis and different uses of reshape, but to me those functions are more of confusion than help. If an array print like [1,2,3], then intuitively the dimension should be [1,3] instead of [3,], right? If Numpy does not have (n, ), I can only see a gain in clarity, not a loss in functionality.

So there must be some design reason behind this. I have been searching from time to time, without finding a clear answer or report. Could someone help clarifying this confusion or provide me some useful references? Your help is much appreciated.

like image 621
Taozi Avatar asked Jan 09 '15 17:01

Taozi


People also ask

What does dimension mean in NumPy?

From this, you will define a point in NumPy by a single axis (dimension), regardless of the number of mathematical axes you use. For x and y axes, a point is defined as [2,4], and for x, y and z axes, a point is defined as [2,4,6].

What does N mean in NumPy?

In Python, arrays from the NumPy library, called N-dimensional arrays or the ndarray, are used as the primary data structure for representing data. In this tutorial, you will discover the N-dimensional array in NumPy for representing numerical and manipulating data in Python.

Does NumPy library contains N-dimensional array object?

The NumPy library contains multidimensional array and matrix data structures (you'll find more information about this in later sections). It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it.

What does N-dimensional array mean?

An ndarray is a (usually fixed-size) multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its shape , which is a tuple of N non-negative integers that specify the sizes of each dimension.


1 Answers

numpy's philosphy is not that a[:, 0] is a "column vector" and a[0, :] a "row vector" in the general case. Rather they are both, quite simply, vectors—i.e. arrays with one and only one dimension. This is actually highly logical and consistent (but yes, can get annoying for those of us accustomed to Matlab).

I say "in the general case" because that is true for numpy's most general data structure, the array, which is intended for all kinds of multi-dimensional dense data storage and manipulation applications—not just matrix math. Having "rows" and "columns" is a highly specialized context for array operations—but yes, a very common one: that's why numpy also supplies the matrix class. Convert your array to a numpy.matrix (or use the matrix constructor instead of array to begin with) and you will see behaviour closer to what you expect. For more information, see What are the differences between numpy arrays and matrices? Which one should I use?

For cases where you're dealing with more than 2 dimensions, take a look at the numpy.expand_dims function. Though the syntax is annoyingly redundant and unpythonically verbose, when I'm working on arrays with more than 2 dimensions (so cannot use matrix), I'm forever having to use expand_dims to do this kind of thing:

A -= numpy.expand_dims( A.mean( axis=2 ), 2 )   # subtract mean-across-layers from A

instead of

A -= A.mean( axis=2 )   # throw an exception while naively attempting to subtract mean-across-layers from A

But consider Matlab, by contrast. Matlab implicitly asserts that there is no such thing as a one-dimensional object and that the minimum number of dimensions a thing can ever have is 2. Sure, you and I are both highly accustomed to this, but take a moment to realize how arbitrary it is. There is clearly a conceptual difference between a fundamentally one-dimensional object, and a two-dimensional object that just happens to have extent 1 in one of its dimensions: the latter is allowed to grow in its second dimension, whereas the former doesn't even know what the second dimension means—and why should it? Hence a.shape==(N,) and a.shape==(N,1) make perfect sense as separate cases. You might as well ask "why is it not (N, 1, 1)?" or "why is it not (N, 1, 1, 1, 1, 1, 1)?"

like image 84
jez Avatar answered Oct 16 '22 04:10

jez