Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array

I'm running into the following exception when operating on a third-party-supplied numpy dataset:

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array

Under what circumstances will numpy raise this? My code is applying a view on the numpy array, where I'm trying to apply a structured dtype that matches the number of elements in a row.

I'm seeing this error when the statement X.view([('', X.dtype)] * X.shape[1]) is called inside a function f - but not in every call to this function f:

ipdb> X.view([('', X.dtype)] * X.shape[1])
*** ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

X is always an array with two axis (len(X.shape) is always 2), so you'd expect a structured dtype that is X.shape[1] long to fit the last axis (X.shape[1]).

The exception does not happen for all datasets, so what causes numpy to throw this for some arrays but not others? I can't even see which .py numpy source code is throwing this error.

I'm finding it hard to produce a MCVE for this, but I've narrowed this down to a colab notebook that is still a little large to post here.

Here X is supposed to be a subset of the iris dataset, which I got from scikit learn.

from sklearn.datasets import load_iris
X = load_iris().data

My code looks like this:

def f(X):
    X_rows = X.view([('', X.dtype)] * X.shape[1])

def g(X):
    f(X)

def h(X):
    f(X)

# call the functions
g(X) # this runs without a problem
f(X) # this returns the error
like image 956
J. Doe Avatar asked Mar 06 '19 13:03

J. Doe


2 Answers

You are trying to create a view on an array with an incompatible memory layout, where the output dtype has an itemsize that doesn't cleanly fit the number of bytes needed in memory to cover the full length of the 'last' axis of the source array. The exception would also apply if you were just setting the .dtype attribute on the array directly, not just to ndarray.view() (which creates a new ndarray with dtype set on that new object).

The 'last' axis here is the 'innermost' dimension in terms of the memory layout; for C-order arrays that's shape[-1], for Fortran-order arrays that's shape[0]. That dimension size times the original dtype.itemsize must be divisible by the new dtype.itemsize, or otherwise you can't 'walk' over the internal memory structure cleanly.

For example, for a C-order (row-major order) array with shape (4, 3, 5) and a dtype.itemsize of 8, the 'last' axis takes up 5 * 8 == 40 bytes of memory, and so you can create a view on this with larger dtypes of sizes 10, 20 and 40. The same array but in Fortran order (column-major order), however, uses 4 * 8 == 32 bytes of memory, limiting your options to larger dtypes of sizes 16 and 32 only.

If X.view([('', X.dtype)] * X.shape[1]) fails, then either X.shape has more dimenions than just 2, or it is an array using Fortran-ordering. You can correct for the first by using X.shape[-1] and you can check for the lattr by looking at ndarray.flags['F_CONTIGUOUS']. Combining these into one expression like the following should work:

X_rows = X.view([('', X.dtype)] * X.shape[0 if X.flags['F_CONTIGUOUS'] else -1])

However, as the ndarray.view() documentation warns:

Views that change the dtype size (bytes per entry) should normally be avoided on arrays defined by slices, transposes, fortran-ordering, etc.[.]

When you try to change the dtype of a Fortran-order array, a warning is raised:

DeprecationWarning: Changing the shape of an F-contiguous array by descriptor assignment is deprecated. To maintain the Fortran contiguity of a multidimensional Fortran array, use 'a.T.view(...).T' instead

so it'd be better to transpose the array, create your view, then transpose the resulting view again:

if X.flags['F_CONTIGUOUS']:
    X_rows = X.T.view([('', X.dtype)] * X.shape[0]).T

You still need to stick to X.shape[0] here, that's shape[-1] of the transposed array.

The fact that support for changing the dtype on Fortran-order arrays is deprecated also can explain the exception's reference to the 'last axis', which is perfectly natural in terms of C-order arrays but feels counter-intuitive when applied to Fortran-order arrays.

I can't even see which .py numpy source code is throwing this error.

Numpy is primarily written in C (with a dash of Fortran 77), and so you need to dig into the source code of the compiled components. The error is thrown in the dtype descriptor setter function, which here is called when the PyArray_View() function calls the PyObject_SetAttrString() function to set the dtype attribute when it is being called from the ndarray.view() method.

According to the source code, not only is changing the dtype of Fortran-order arrays deprecated, but views on non-contiguous arrays are not supported at all (meaning that if both X.flags['C_CONTIGUOUS'] and X.flags['F_CONTIGUOUS'] are False then you can't change the dtype at all).

like image 93
Martijn Pieters Avatar answered Sep 29 '22 04:09

Martijn Pieters


So trying to reproduce your situation:

In [129]: from sklearn import datasets                                          
In [131]: iris = datasets.load_iris()                                           

In [132]: X = iris.data                                                         
In [133]: X.shape                                                               
Out[133]: (150, 4)
In [134]: X.dtype                                                               
Out[134]: dtype('float64')

In [135]: X_rows = X.view([('',X.dtype)] * X.shape[1])                          
In [136]: X_rows.shape                                                          
Out[136]: (150, 1)
In [137]: X_rows.dtype                                                          
Out[137]: dtype([('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])

So far looks good.


I was about giveup, since I didn't want to debug your notebook. But I may have hit on a possible cause.

At the start of your run there's a warning:

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:14: DeprecationWarning: Changing the shape of an F-contiguous array by descriptor assignment is deprecated. To maintain the Fortran contiguity of a multidimensional Fortran array, use 'a.T.view(...).T' instead

You are running this function with a pandas apply, which I haven't used much. But I am aware that pandas prefers an 'F' order, since it is Series oriented. So what happens if I switch X to that order?

In [148]: X1 = X.copy(order='F')                                                
In [149]: X_rows = X1[:0].view([('',X1.dtype)] * X1.shape[1])                   
In [150]: X_rows                                                                
Out[150]: 
array([], shape=(0, 1),
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
In [151]: X_rows = X1[:6].view([('',X1.dtype)] * X1.shape[1])                   
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-f3272035dc14> in <module>
----> 1 X_rows = X1[:6].view([('',X1.dtype)] * X1.shape[1])

ValueError: To change to a dtype of a different size, the array must be C-contiguous

OK, it's not the same error, but it does show that order can affect this type of view.


But let's take the array from your comment - and give it an order F:

In [153]: a = np.array([[4.7, 3.2, 1.3, 0.2],[4.6, 3.1, 1.5, 0.2],[4.6, 3.4, 1.4
     ...: , 0.3],[4.4, 3. , 1.3, 0.2],[4.4, 3.2, 1.3, 0.2],[4.6, 3.2, 1.4, 0.2]]
     ...: , dtype='float64', order='F')                                         

In [154]: a.view([('', a.dtype)] * a.shape[1])                                  
/usr/local/bin/ipython3:1: DeprecationWarning: Changing the shape of an F-contiguous array by descriptor assignment is deprecated. To maintain the Fortran contiguity of a multidimensional Fortran array, use 'a.T.view(...).T' instead
  #!/usr/bin/python3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-154-b804730eb70b> in <module>
----> 1 a.view([('', a.dtype)] * a.shape[1])

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

This is it - the warning, and error as shown in notebook.

like image 24
hpaulj Avatar answered Sep 29 '22 02:09

hpaulj