Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy safe programming

Tags:

python

numpy

Are there any sources or guidelines for safe, bug-free numerical programming with numpy?

I'm asking because I've painfully learned that numpy does many things that seem to really ask for bugs to happen, such as...

Adding matrices of different sizes ("broadcasting") without complaining:

In: np.array([1]) + np.identity(2)
Out: array([[ 2.,  1.],
            [ 1.,  2.]])

Returning different data types depending on the input:

In: scalar1 = 1
In: scalar2 = 1.
In: np.array(scalar1).dtype
Out: dtype('int32')
In: np.array(scalar2).dtype
Out: dtype('float64')

Or simply not performing a desired operation (again, depending on the data type) without raising any warnings:

In: np.squeeze(np.array([[1, 1]])).ndim
Out: 1
In: np.squeeze(np.matrix([[1, 1]])).ndim
Out: 2

These are all very hard to discover bugs, since they do not raise any exceptions or warnings and often return results of the valid data types / shapes. Therefore my question: Are there any general guidelines for improving the safety and preventing bugs in mathematical programming with numpy?

[Note that I don't believe that this answer will attract "opinionated answers and discussions" since it is not about personal recommendations, but rather asking whether there are any existing guidelines or sources on the subject at all - of which I could not find any.]

like image 447
jhin Avatar asked Feb 24 '26 07:02

jhin


1 Answers

Frequently I ask SO questioners, what's the shape? the dtype? even the type. Keeping tracking of those properties is a big part of good numpy programming. Even in MATLAB I found that getting the size right was 80% of debugging.

type

The squeeze example revolves around type, the ndarray class versus the np.matrix subclass:

In [160]: np.squeeze(np.array([[1, 1]]))
Out[160]: array([1, 1])
In [161]: np.squeeze(np.matrix([[1, 1]]))
Out[161]: matrix([[1, 1]])

np.matrix object is, by definition, always 2d. That's the core of how it redefines ndarray operations.

Many numpy functions delegate their work to methods. The code fornp.squeeze` is:

try:
    squeeze = a.squeeze
except AttributeError:
    return _wrapit(a, 'squeeze')
try:
    # First try to use the new axis= parameter
    return squeeze(axis=axis)
except TypeError:
    # For backwards compatibility
    return squeeze()

So In [161] is really:

In [163]: np.matrix([[1, 1]]).squeeze()
Out[163]: matrix([[1, 1]])

np.matrix.squeeze has its own documentation.

As a general rule we discourage the use of np.matrix. It was a created years ago to make things easier for wayward MATLAB programmers. Back in those days MATLAB only had 2d matrices (even now MATLAB 'scalars' are 2d).

dtype

np.array is a powerful function. Usually its behavior is intuitive, but sometimes it makes too many assumptions.

Usually it takes clues from the input, whether integer, float, string, and/or lists:

In [170]: np.array(1).dtype
Out[170]: dtype('int64')
In [171]: np.array(1.0).dtype
Out[171]: dtype('float64')

But it provides a number of parameters. Use those if you need more control:

array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)

In [173]: np.array(1, float).dtype
Out[173]: dtype('float64')
In [174]: np.array('1', float).dtype
Out[174]: dtype('float64')
In [177]: np.array('1', dtype=float,ndmin=2)
Out[177]: array([[1.]])

Look at it's docs, and also at the https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html page which lists many other array creation functions. Look at some their code as well.

For example np.atleast_2d does a lot of shape checking:

def atleast_2d(*arys):
    res = []
    for ary in arys:
        ary = asanyarray(ary)
        if ary.ndim == 0:
            result = ary.reshape(1, 1)
        elif ary.ndim == 1:
            result = ary[newaxis,:]
        else:
            result = ary
        res.append(result)
    if len(res) == 1:
        return res[0]
    else:
        return res

Functions like this are good examples of defensive programming.

We get a lot SO questions about 1d arrays with dtype=object.

In [272]: np.array([[1,2,3],[2,3]])
Out[272]: array([list([1, 2, 3]), list([2, 3])], dtype=object)

np.array tries to create a multidimensional array with a uniform dtype. But if the elements differ in size or can't be cast to the same dtype, it will fall back on object dtype. This is one of those situations where we need to pay attention to shape and dtype.

broadcasting

Broadcasting has been a part of numpy forever, and there's no way of turning it off. Octave and MATLAB have added it later, and do enable warning switches.

The first defensive step is to understand the broadcasting principles, namely

  • it can expand the beginning dimensions to match
  • it coerce unitary dimensions to match.

So a basic example is:

In [180]: np.arange(3)[:,None] + np.arange(4)
Out[180]: 
array([[0, 1, 2, 3],
       [1, 2, 3, 4],
       [2, 3, 4, 5]])

The first term is (3,) expanded to (3,1). The second is (4,) which, by broadcasting expands to (1,4). Together (3,1) and (1,4) broadcast to (3,4).

Many numpy functions have parameters that make keeping track of dimensions easier. For example sum (and others) has a keepdims parameter:

In [181]: arr = _
In [182]: arr.sum(axis=0)
Out[182]: array([ 3,  6,  9, 12])         # (4,) shape
In [183]: arr.sum(axis=0,keepdims=True)
Out[183]: array([[ 3,  6,  9, 12]])       # (1,4) shape
In [184]: arr/_                           # (3,4) / (1,4) => (3,4)
Out[184]: 
array([[0.        , 0.16666667, 0.22222222, 0.25      ],
       [0.33333333, 0.33333333, 0.33333333, 0.33333333],
       [0.66666667, 0.5       , 0.44444444, 0.41666667]])

In this case the keepdims isn't essential since (3,4)/(4,) works. But with axis=1 sum the shape becomes (3,) which can't broadcast with (3,4). But (3,1) can:

In [185]: arr/arr.sum(axis=1,keepdims=True)
Out[185]: 
array([[0.        , 0.16666667, 0.33333333, 0.5       ],
       [0.1       , 0.2       , 0.3       , 0.4       ],
       [0.14285714, 0.21428571, 0.28571429, 0.35714286]])

To manage shapes I like to:

  • display shape while debugging
  • test snippets interactively
  • test with diagnostic shapes, e.g. np.arange(24).reshape(2,3,4)
  • assertion statements in functions can be useful assert(arr.ndim==1)

typing

Recent Python 3 versions have added a typing module

https://docs.python.org/3/library/typing.html

Even for built-in Python types it's provisional. I'm not sure much has been added for numpy.

like image 165
hpaulj Avatar answered Feb 26 '26 21:02

hpaulj