Suppose I have the following numpy structured array:
In [250]: x
Out[250]:
array([(22, 2, -1000000000, 2000), (22, 2, 400, 2000),
(22, 2, 804846, 2000), (44, 2, 800, 4000), (55, 5, 900, 5000),
(55, 5, 1000, 5000), (55, 5, 8900, 5000), (55, 5, 11400, 5000),
(33, 3, 14500, 3000), (33, 3, 40550, 3000), (33, 3, 40990, 3000),
(33, 3, 44400, 3000)],
dtype=[('f1', '<i4'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<i4')])
I am trying to modify a subset of the above array to a regular numpy array. It is essential for my application that no copies are created (only views).
Fields are retrieved from the above structured array by using the following function:
def fields_view(array, fields):
return array.getfield(numpy.dtype(
{name: array.dtype.fields[name] for name in fields}
))
If I am interested in fields 'f2' and 'f3', I would do the following:
In [251]: y=fields_view(x,['f2','f3'])
In [252]: y
Out [252]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
There is a way to directly get an ndarray from the 'f2' and 'f3' fields of the original structured array. However, for my application, it is necessary to build this intermediary structured array as this data subset is an attribute of a class.
I can't convert the intermediary structured array to a regular numpy array without doing a copy.
In [253]: y.view(('<f4', len(y.dtype.names)))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-f8fc3a40fd1b> in <module>()
----> 1 y.view(('<f4', len(y.dtype.names)))
ValueError: new type not compatible with array.
This function can also be used to convert a record array to an ndarray:
def recarr_to_ndarr(x,typ):
fields = x.dtype.names
shape = x.shape + (len(fields),)
offsets = [x.dtype.fields[name][1] for name in fields]
assert not any(np.diff(offsets, n=2))
strides = x.strides + (offsets[1] - offsets[0],)
y = np.ndarray(shape=shape, dtype=typ, buffer=x,
offset=offsets[0], strides=strides)
return y
However, I get the following error:
In [254]: recarr_to_ndarr(y,'<f4')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-65-2ebda2a39e9f> in <module>()
----> 1 recarr_to_ndarr(y,'<f4')
<ipython-input-62-8a9eea8e7512> in recarr_to_ndarr(x, typ)
8 strides = x.strides + (offsets[1] - offsets[0],)
9 y = np.ndarray(shape=shape, dtype=typ, buffer=x,
---> 10 offset=offsets[0], strides=strides)
11 return y
12
TypeError: expected a single-segment buffer object
The function works fine if I create a copy:
In [255]: recarr_to_ndarr(np.array(y),'<f4')
Out[255]:
array([[ 2.00000000e+00, -1.00000000e+09],
[ 2.00000000e+00, 4.00000000e+02],
[ 2.00000000e+00, 8.04846000e+05],
[ 2.00000000e+00, 8.00000000e+02],
[ 5.00000000e+00, 9.00000000e+02],
[ 5.00000000e+00, 1.00000000e+03],
[ 5.00000000e+00, 8.90000000e+03],
[ 5.00000000e+00, 1.14000000e+04],
[ 3.00000000e+00, 1.45000000e+04],
[ 3.00000000e+00, 4.05500000e+04],
[ 3.00000000e+00, 4.09900000e+04],
[ 3.00000000e+00, 4.44000000e+04]], dtype=float32)
There seems to be no difference between the two arrays:
In [66]: y
Out[66]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
In [67]: np.array(y)
Out[67]:
array([(2.0, -1000000000.0), (2.0, 400.0), (2.0, 804846.0), (2.0, 800.0),
(5.0, 900.0), (5.0, 1000.0), (5.0, 8900.0), (5.0, 11400.0),
(3.0, 14500.0), (3.0, 40550.0), (3.0, 40990.0), (3.0, 44400.0)],
dtype={'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
This answer is a bit long and rambling. I started with what I knew from previous work on taking array views, and then tried to relate that to your functions.
================
In your case, all fields are 4 bytes long, both floats and ints. I can then view it as all ints or all floats:
In [1431]: x
Out[1431]:
array([(22, 2.0, -1000000000.0, 2000), (22, 2.0, 400.0, 2000),
(22, 2.0, 804846.0, 2000), (44, 2.0, 800.0, 4000),
(55, 5.0, 900.0, 5000), (55, 5.0, 1000.0, 5000),
(55, 5.0, 8900.0, 5000), (55, 5.0, 11400.0, 5000),
(33, 3.0, 14500.0, 3000), (33, 3.0, 40550.0, 3000),
(33, 3.0, 40990.0, 3000), (33, 3.0, 44400.0, 3000)],
dtype=[('f1', '<i4'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<i4')])
In [1432]: x.view('i4')
Out[1432]:
array([ 22, 1073741824, -831624408, 2000, 22,
1073741824, 1137180672, 2000, 22, 1073741824,
1229225696, 2000, 44, 1073741824, 1145569280,
.... 3000])
In [1433]: x.view('f4')
Out[1433]:
array([ 3.08285662e-44, 2.00000000e+00, -1.00000000e+09,
2.80259693e-42, 3.08285662e-44, 2.00000000e+00,
.... 4.20389539e-42], dtype=float32)
This view is 1d. I can reshape and slice the 2 float columns
In [1434]: x.shape
Out[1434]: (12,)
In [1435]: x.view('f4').reshape(12,-1)
Out[1435]:
array([[ 3.08285662e-44, 2.00000000e+00, -1.00000000e+09,
2.80259693e-42],
[ 3.08285662e-44, 2.00000000e+00, 4.00000000e+02,
2.80259693e-42],
...
[ 4.62428493e-44, 3.00000000e+00, 4.44000000e+04,
4.20389539e-42]], dtype=float32)
In [1437]: x.view('f4').reshape(12,-1)[:,1:3]
Out[1437]:
array([[ 2.00000000e+00, -1.00000000e+09],
[ 2.00000000e+00, 4.00000000e+02],
[ 2.00000000e+00, 8.04846000e+05],
[ 2.00000000e+00, 8.00000000e+02],
...
[ 3.00000000e+00, 4.44000000e+04]], dtype=float32)
That this is a view can be verified by doing a bit of inplace math, and seeing the results in x:
In [1439]: y=x.view('f4').reshape(12,-1)[:,1:3]
In [1440]: y[:,0] += .5
In [1441]: y
Out[1441]:
array([[ 2.50000000e+00, -1.00000000e+09],
[ 2.50000000e+00, 4.00000000e+02],
...
[ 3.50000000e+00, 4.44000000e+04]], dtype=float32)
In [1442]: x
Out[1442]:
array([(22, 2.5, -1000000000.0, 2000), (22, 2.5, 400.0, 2000),
(22, 2.5, 804846.0, 2000), (44, 2.5, 800.0, 4000),
(55, 5.5, 900.0, 5000), (55, 5.5, 1000.0, 5000),
(55, 5.5, 8900.0, 5000), (55, 5.5, 11400.0, 5000),
(33, 3.5, 14500.0, 3000), (33, 3.5, 40550.0, 3000),
(33, 3.5, 40990.0, 3000), (33, 3.5, 44400.0, 3000)],
dtype=[('f1', '<i4'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<i4')])
If the field sizes differed this might be impossible. For example if the floats were 8 bytes. The key is picturing how the structured data is stored, and imagining whether that can be viewed as a simple dtype of multiple columns. And field choice has to be equivalent to a basic slice. Working with ['f1','f4'] would be equivalent to advanced indexing with [:,[0,3], which has to be a copy.
==========
The 'direct' field indexing is:
z = x[['f2','f3']].view('f4').reshape(12,-1)
z -= .5
modifies z but with a futurewarning. Also it does not modify x; z has become a copy. I can also see this by looking at z.__array_interface__['data'], the data buffer location (and comparing with that of x and y).
=================
Your fields_view does create a structured view:
In [1480]: w=fields_view(x,['f2','f3'])
In [1481]: w.__array_interface__['data']
Out[1481]: (151950184, False)
In [1482]: x.__array_interface__['data']
Out[1482]: (151950184, False)
which can be used to modify x, w['f2'] -= .5. So it is more versatile than the 'direct' x[['f2','f3']].
The w dtype is
dtype({'names':['f2','f3'], 'formats':['<f4','<f4'], 'offsets':[4,8], 'itemsize':12})
Adding print(shape, typ, offsets, strides) to your recarr_to_ndarr, I get (py3)
In [1499]: recarr_to_ndarr(w,'<f4')
(12, 2) <f4 [4, 8] (16, 4)
....
ValueError: ndarray is not contiguous
In [1500]: np.ndarray(shape=(12,2), dtype='<f4', buffer=w.data, offset=4, strides=(16,4))
...
BufferError: memoryview: underlying buffer is not contiguous
That contiguous problem must be refering to the values shown in w.flags:
In [1502]: w.flags
Out[1502]:
C_CONTIGUOUS : False
F_CONTIGUOUS : False
....
It's interesting that w.dtype.descr converts the 'offsets' into a unnamed field:
In [1506]: w.__array_interface__
Out[1506]:
{'data': (151950184, False),
'descr': [('', '|V4'), ('f2', '<f4'), ('f3', '<f4')],
'shape': (12,),
'strides': (16,),
'typestr': '|V12',
'version': 3}
One way or other, w has a non-contiguous data buffer, which can't be used to create a new array. Flattened, the data buffer looks something like
xoox|xoox|xoox|...
# x 4 bytes we want to skip
# o 4 bytes we want to use
# | invisible bdry between records in x
The y I constructed above has:
In [1511]: y.__array_interface__
Out[1511]:
{'data': (151950188, False),
'descr': [('', '<f4')],
'shape': (12, 2),
'strides': (16, 4),
'typestr': '<f4',
'version': 3}
So it accesses the o bytes with a 4 byte offset, and then (16,4) strides, and (12,2) shape.
If I modify your ndarray call to use the original x.data, it works:
In [1514]: xx=np.ndarray(shape=(12,2), dtype='<f4', buffer=x.data, offset=4, strides=(16,4))
In [1515]: xx
Out[1515]:
array([[ 2.00000000e+00, -1.00000000e+09],
[ 2.00000000e+00, 4.00000000e+02],
....
[ 3.00000000e+00, 4.44000000e+04]], dtype=float32)
with the same array_interface as my y:
In [1516]: xx.__array_interface__
Out[1516]:
{'data': (151950188, False),
'descr': [('', '<f4')],
'shape': (12, 2),
'strides': (16, 4),
'typestr': '<f4',
'version': 3}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With