I'm running into the following exception when operating on a third-party-supplied numpy dataset:
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array
Under what circumstances will numpy raise this? My code is applying a view on the numpy array, where I'm trying to apply a structured dtype
that matches the number of elements in a row.
I'm seeing this error when the statement X.view([('', X.dtype)] * X.shape[1])
is called inside a function f
- but not in every call to this function f
:
ipdb> X.view([('', X.dtype)] * X.shape[1])
*** ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
X
is always an array with two axis (len(X.shape)
is always 2), so you'd expect a structured dtype
that is X.shape[1]
long to fit the last axis (X.shape[1]
).
The exception does not happen for all datasets, so what causes numpy to throw this for some arrays but not others? I can't even see which .py numpy source code is throwing this error.
I'm finding it hard to produce a MCVE for this, but I've narrowed this down to a colab notebook that is still a little large to post here.
Here X
is supposed to be a subset of the iris
dataset, which I got from scikit learn.
from sklearn.datasets import load_iris
X = load_iris().data
My code looks like this:
def f(X):
X_rows = X.view([('', X.dtype)] * X.shape[1])
def g(X):
f(X)
def h(X):
f(X)
# call the functions
g(X) # this runs without a problem
f(X) # this returns the error
You are trying to create a view on an array with an incompatible memory layout, where the output dtype
has an itemsize that doesn't cleanly fit the number of bytes needed in memory to cover the full length of the 'last' axis of the source array. The exception would also apply if you were just setting the .dtype
attribute on the array directly, not just to ndarray.view()
(which creates a new ndarray
with dtype
set on that new object).
The 'last' axis here is the 'innermost' dimension in terms of the memory layout; for C-order arrays that's shape[-1]
, for Fortran-order arrays that's shape[0]
. That dimension size times the original dtype.itemsize
must be divisible by the new dtype.itemsize
, or otherwise you can't 'walk' over the internal memory structure cleanly.
For example, for a C-order (row-major order) array with shape (4, 3, 5)
and a dtype.itemsize
of 8, the 'last' axis takes up 5 * 8 == 40 bytes of memory, and so you can create a view on this with larger dtypes of sizes 10, 20 and 40. The same array but in Fortran order (column-major order), however, uses 4 * 8 == 32 bytes of memory, limiting your options to larger dtypes of sizes 16 and 32 only.
If X.view([('', X.dtype)] * X.shape[1])
fails, then either X.shape
has more dimenions than just 2, or it is an array using Fortran-ordering. You can correct for the first by using X.shape[-1]
and you can check for the lattr by looking at ndarray.flags['F_CONTIGUOUS']
. Combining these into one expression like the following should work:
X_rows = X.view([('', X.dtype)] * X.shape[0 if X.flags['F_CONTIGUOUS'] else -1])
However, as the ndarray.view()
documentation warns:
Views that change the
dtype
size (bytes per entry) should normally be avoided on arrays defined by slices, transposes, fortran-ordering, etc.[.]
When you try to change the dtype of a Fortran-order array, a warning is raised:
DeprecationWarning: Changing the shape of an F-contiguous array by descriptor assignment is deprecated. To maintain the Fortran contiguity of a multidimensional Fortran array, use 'a.T.view(...).T' instead
so it'd be better to transpose the array, create your view, then transpose the resulting view again:
if X.flags['F_CONTIGUOUS']:
X_rows = X.T.view([('', X.dtype)] * X.shape[0]).T
You still need to stick to X.shape[0]
here, that's shape[-1]
of the transposed array.
The fact that support for changing the dtype
on Fortran-order arrays is deprecated also can explain the exception's reference to the 'last axis', which is perfectly natural in terms of C-order arrays but feels counter-intuitive when applied to Fortran-order arrays.
I can't even see which .py numpy source code is throwing this error.
Numpy is primarily written in C (with a dash of Fortran 77), and so you need to dig into the source code of the compiled components. The error is thrown in the dtype
descriptor setter function, which here is called when the PyArray_View()
function calls the PyObject_SetAttrString()
function to set the dtype
attribute when it is being called from the ndarray.view()
method.
According to the source code, not only is changing the dtype of Fortran-order arrays deprecated, but views on non-contiguous arrays are not supported at all (meaning that if both X.flags['C_CONTIGUOUS']
and X.flags['F_CONTIGUOUS']
are False
then you can't change the dtype
at all).
So trying to reproduce your situation:
In [129]: from sklearn import datasets
In [131]: iris = datasets.load_iris()
In [132]: X = iris.data
In [133]: X.shape
Out[133]: (150, 4)
In [134]: X.dtype
Out[134]: dtype('float64')
In [135]: X_rows = X.view([('',X.dtype)] * X.shape[1])
In [136]: X_rows.shape
Out[136]: (150, 1)
In [137]: X_rows.dtype
Out[137]: dtype([('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
So far looks good.
I was about giveup, since I didn't want to debug your notebook. But I may have hit on a possible cause.
At the start of your run there's a warning:
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:14: DeprecationWarning: Changing the shape of an F-contiguous array by descriptor assignment is deprecated. To maintain the Fortran contiguity of a multidimensional Fortran array, use 'a.T.view(...).T' instead
You are running this function with a pandas apply
, which I haven't used much. But I am aware that pandas prefers an 'F' order, since it is Series oriented. So what happens if I switch X
to that order?
In [148]: X1 = X.copy(order='F')
In [149]: X_rows = X1[:0].view([('',X1.dtype)] * X1.shape[1])
In [150]: X_rows
Out[150]:
array([], shape=(0, 1),
dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
In [151]: X_rows = X1[:6].view([('',X1.dtype)] * X1.shape[1])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-151-f3272035dc14> in <module>
----> 1 X_rows = X1[:6].view([('',X1.dtype)] * X1.shape[1])
ValueError: To change to a dtype of a different size, the array must be C-contiguous
OK, it's not the same error, but it does show that order can affect this type of view
.
But let's take the array from your comment - and give it an order F
:
In [153]: a = np.array([[4.7, 3.2, 1.3, 0.2],[4.6, 3.1, 1.5, 0.2],[4.6, 3.4, 1.4
...: , 0.3],[4.4, 3. , 1.3, 0.2],[4.4, 3.2, 1.3, 0.2],[4.6, 3.2, 1.4, 0.2]]
...: , dtype='float64', order='F')
In [154]: a.view([('', a.dtype)] * a.shape[1])
/usr/local/bin/ipython3:1: DeprecationWarning: Changing the shape of an F-contiguous array by descriptor assignment is deprecated. To maintain the Fortran contiguity of a multidimensional Fortran array, use 'a.T.view(...).T' instead
#!/usr/bin/python3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-154-b804730eb70b> in <module>
----> 1 a.view([('', a.dtype)] * a.shape[1])
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.
This is it - the warning, and error as shown in notebook.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With