My understanding is that 1-D arrays in numpy can be interpreted as either a column-oriented vector or a row-oriented vector. For instance, a 1-D array with shape <code>(8,)</code> can be viewed as a 2-D array of shape <code>(1,8)</code> or shape <code>(8,1)</code> depending on context. The problem I'm having is that the functions I write to manipulate arrays tend to generalize well in the 2-D case to handle both vectors and matrices, but not so well in the 1-D case. As such, my functions end up doing something like this: <pre class="prettyprint"><code>if arr.ndim == 1: # Do it this way else: # Do it that way </code></pre> Or even this: <pre class="prettyprint"><code># Reshape the 1-D array to a 2-D array if arr.ndim == 1: arr = arr.reshape((1, arr.shape[0])) # ... Do it the 2-D way ... </code></pre> That is, I find I can generalize code to handle 2-D cases <code>(r,1)</code>, <code>(1,c)</code>, <code>(r,c)</code>, but not the 1-D cases without branching or reshaping. It gets even uglier when the function operates on multiple arrays as I would check and convert each argument. So my question is: am I missing some better idiom? Is the pattern I've described above common to numpy code? Also, as a related matter of API design principles, if the caller passes a 1-D array to some function that returns a new array, and the return value is also a vector, is it common practice to reshape a 2-D vector <code>(r,1)</code> or <code>(1,c)</code> back to a 1-D array or simply document that the function returns a 2-D array regardless? Thanks

I think in general NumPy functions that require an array of shape <code>(r,c)</code> make no special allowance for 1-D arrays. Instead, they expect the user to either pass an array of shape <code>(r,c)</code> exactly, or for the user to pass a 1-D array that broadcasts up to shape <code>(r,c)</code>. If you pass such a function a 1-D array of shape <code>(c,)</code> it will broadcast to shape <code>(1,c)</code>, since broadcasting adds new axes on the left. It can also broadcast to shape <code>(r,c)</code> for an arbitrary <code>r</code> (depending on what other array it is being combined with). On the other hand, if you have a 1-D array, <code>x</code>, of shape <code>(r,)</code> and you need it to broadcast up to shape <code>(r,c)</code>, then NumPy expects the user to pass an array of shape <code>(r,1)</code> since broadcasting will not add the new axes on the right for you. To do that, the user must pass <code>x[:,np.newaxis]</code> instead of just <code>x</code>. <hr> Regarding return values: I think it better to always return a 2-D array. If the user knows the output will be of shape <code>(1,c)</code>, and wants a 1-D array, let her slice off the 1-D array <code>x[0]</code> herself. By making the return value always the same shape, it will be easier to understand code that uses this function, since it is not always immediately apparent what the shape of the inputs are. Also, broadcasting blurs the distinction between a 1-D array of shape <code>(c,)</code> and a 2-D array of shape <code>(r,c)</code>. If your function returns a 1-D array when fed 1-D input, and a 2-D array when fed 2-D input, then your function makes the distinction strict instead of blurred. Stylistically, this reminds me of checking <code>if isinstance(obj,type)</code>, which goes against the grain of duck-typing. Don't do it if you don't have to.

This question has already very good answers. Here I just want to add what I usually do (which somehow summarizes responses by others) when I want to write functions that accept a wide range of inputs while the operations I do on them require a 2d row or column vector. <ol> <li> If I know the input is always 1d (array or list): a. if I need a row: <code>x = np.asarray(x)[None,:]</code> b. if I need a column: <code>x = np.asarray(x)[:,None]</code> </li> <li> If the input can be either 2d (array or list) with the right shape or 1d (which needs to be converted to 2d row/column): a. if I need a row: <code>x = np.atleast_2d(x)</code> b. if I need a column: <code>x = np.atleast_2d(np.asarray(x).T).T</code> or <code>x = np.reshape(x, (len(x),-1))</code> (the latter seems faster) </li> </ol>

This is a good use for decorators <pre class="prettyprint"><code>def atmost_2d(func): def wrapr(x): return func(np.atleast_2d(x)).squeeze() return wrapr </code></pre> For example, this function will pick out the last column of its input. <pre class="prettyprint"><code>@atmost_2d def g(x): return x[:,-1] </code></pre> But: it works for: 1d: <pre class="prettyprint"><code>In [46]: b Out[46]: array([0, 1, 2, 3, 4, 5]) In [47]: g(b) Out[47]: array(5) </code></pre> 2d: <pre class="prettyprint"><code>In [49]: A Out[49]: array([[0, 1], [2, 3], [4, 5]]) In [50]: g(A) Out[50]: array([1, 3, 5]) </code></pre> 0d: <pre class="prettyprint"><code>In [51]: g(99) Out[51]: array(99) </code></pre> This answer builds on the previous two.

Writing functions that accept both 1-D and 2-D numpy arrays?

Tags:

python

vectorization

numpy

api-design

My understanding is that 1-D arrays in numpy can be interpreted as either a column-oriented vector or a row-oriented vector. For instance, a 1-D array with shape (8,) can be viewed as a 2-D array of shape (1,8) or shape (8,1) depending on context.

The problem I'm having is that the functions I write to manipulate arrays tend to generalize well in the 2-D case to handle both vectors and matrices, but not so well in the 1-D case.

As such, my functions end up doing something like this:

if arr.ndim == 1:
    # Do it this way
else:
    # Do it that way

Or even this:

# Reshape the 1-D array to a 2-D array
if arr.ndim == 1:
    arr = arr.reshape((1, arr.shape[0]))

# ... Do it the 2-D way ...

That is, I find I can generalize code to handle 2-D cases (r,1), (1,c), (r,c), but not the 1-D cases without branching or reshaping.

It gets even uglier when the function operates on multiple arrays as I would check and convert each argument.

So my question is: am I missing some better idiom? Is the pattern I've described above common to numpy code?

Also, as a related matter of API design principles, if the caller passes a 1-D array to some function that returns a new array, and the return value is also a vector, is it common practice to reshape a 2-D vector (r,1) or (1,c) back to a 1-D array or simply document that the function returns a 2-D array regardless?

Thanks

595

asked Nov 27 '11 16:11

Joe Holloway

4 Answers

I think in general NumPy functions that require an array of shape (r,c) make no special allowance for 1-D arrays. Instead, they expect the user to either pass an array of shape (r,c) exactly, or for the user to pass a 1-D array that broadcasts up to shape (r,c).

If you pass such a function a 1-D array of shape (c,) it will broadcast to shape (1,c), since broadcasting adds new axes on the left. It can also broadcast to shape (r,c) for an arbitrary r (depending on what other array it is being combined with).

On the other hand, if you have a 1-D array, x, of shape (r,) and you need it to broadcast up to shape (r,c), then NumPy expects the user to pass an array of shape (r,1) since broadcasting will not add the new axes on the right for you.

To do that, the user must pass x[:,np.newaxis] instead of just x.

Regarding return values: I think it better to always return a 2-D array. If the user knows the output will be of shape (1,c), and wants a 1-D array, let her slice off the 1-D array x[0] herself.

By making the return value always the same shape, it will be easier to understand code that uses this function, since it is not always immediately apparent what the shape of the inputs are.

Also, broadcasting blurs the distinction between a 1-D array of shape (c,) and a 2-D array of shape (r,c). If your function returns a 1-D array when fed 1-D input, and a 2-D array when fed 2-D input, then your function makes the distinction strict instead of blurred. Stylistically, this reminds me of checking if isinstance(obj,type), which goes against the grain of duck-typing. Don't do it if you don't have to.

answered Oct 05 '22 23:10

unutbu

unutbu's explanation is good, but I disagree on the return dimension.

The function internal pattern depends on the type of function.

Reduce operations with an axis argument can often be written so that the number of dimensions doesn't matter.

Numpy has also an atleast_2d (and atleast_1d) function that is also commonly used if you need an explicit 2d array. In statistics, I sometimes use a function like atleast_2d_cols, that reshapes 1d (r,) to 2d (r,1) for code that expects 2d, or if the input array is 1d, then the interpretation and linear algebra requires a column vector. (reshaping is cheap so this is not a problem)

In a third case, I might have different code paths if the lower dimensional case can be done cheaper or simpler than the higher dimensional case. (example: if 2d requires several dot products.)

return dimension

I think not following the numpy convention with the return dimension can be very confusing to users for general functions. (topic specific functions can be different.) For example, reduce operations loose one dimension.

For many other functions the output dimension matches the input dimension. I think a 1d input should have a 1d output and not an extra redundant dimension. Except for functions in linalg, I don't remember any functions that would return a redundant extra dimension. (The scalar versus 1-element array case is not always consistent.)

Stylistically this reminds me of an isinstance check:

Try without it if you allow for example for numpy matrices and masked arrays. You will get funny results that are not easy to debug. Although, for most numpy and scipy functions the user has to know whether the array type will work with them, since there are few isinstance checks and asarray might not always do the right thing.

As a user, I always know what kind of "array_like" I have, a list, tuple or which array subclass, especially when I use multiplication.

np.array(np.eye(3).tolist()*3)
np.matrix(range(3)) * np.eye(3)
np.arange(3) * np.eye(3)

another example: What does this do?

>>> x = np.array(tuple(range(3)), [('',int)]*3)
>>> x
array((0, 1, 2), 
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
>>> x * np.eye(3)

answered Oct 06 '22 00:10

Josef

This question has already very good answers. Here I just want to add what I usually do (which somehow summarizes responses by others) when I want to write functions that accept a wide range of inputs while the operations I do on them require a 2d row or column vector.

If I know the input is always 1d (array or list):

a. if I need a row: x = np.asarray(x)[None,:]

b. if I need a column: x = np.asarray(x)[:,None]
If the input can be either 2d (array or list) with the right shape or 1d (which needs to be converted to 2d row/column):

a. if I need a row: x = np.atleast_2d(x)

b. if I need a column: x = np.atleast_2d(np.asarray(x).T).T or x = np.reshape(x, (len(x),-1)) (the latter seems faster)

answered Oct 05 '22 23:10

Luca Citi

This is a good use for decorators

def atmost_2d(func):
  def wrapr(x):
    return func(np.atleast_2d(x)).squeeze()
  return wrapr

For example, this function will pick out the last column of its input.

@atmost_2d
def g(x):
  return x[:,-1]

But: it works for:

1d:

In [46]: b
Out[46]: array([0, 1, 2, 3, 4, 5])

In [47]: g(b)
Out[47]: array(5)

2d:

In [49]: A
Out[49]:
array([[0, 1],
       [2, 3],
       [4, 5]])

In [50]: g(A)
Out[50]: array([1, 3, 5])

0d:

In [51]: g(99)
Out[51]: array(99)

This answer builds on the previous two.

answered Oct 05 '22 23:10

Patrick

Related questions
                            
                                "WARNING conda.gateways.disk:exp_backoff_fn(47): Uncaught backoff with errno 41" during "conda install"
                            
                                Installing numpy for Windows 10: Importing the multiarray numpy extension module failed
                            
                                How to use TaggedDocument in gensim?
                            
                                How to make/use a custom database function in Django
                            
                                PyQt5 and OpenCV have similar libraries; how to avoid conflict between the 2?
                            
                                TF 2.0 print tensor values
                            
                                Show complete documentation in vscode
                            
                                Python function to find the numeric volume integral?
                            
                                SQLAlchemy - Dictionary of tags
                            
                                Pure python solution to convert XHTML to PDF
                            
                                string to datetime with fractional seconds, on Google App Engine
                            
                                Are there conventions for Python module comments?
                            
                                ImportError: No module named ***** in python
                            
                                packaging common python namespaces
                            
                                Pyramid: simpleform or deform?
                            
                                Using MongoDB as our master database, should I use a separate graph database to implement relationships between entities?
                            
                                Eggs in path before PYTHONPATH environment variable
                            
                                Can the pudb debugger be used on windows?
                            
                                Passing variables between Python and Javascript
                            
                                Get a list of python packages used by a Django Project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Writing functions that accept both 1-D and 2-D numpy arrays?

Tags:

python

vectorization

numpy

api-design

Joe Holloway

People also ask

4 Answers

unutbu

Josef

Luca Citi

Patrick

Recent Activity

Donate For Us