I really like the functionally of the namedtuple collection. Specifically I like how useful it is for points in 2-dimensional space.
In : from collections import namedtuple
In : Point = namedtuple('Point', ['x', 'y'])
In : p = Point(1,2)
In : p.x
Out: 1
In : p.y
Out: 2
I think that's a lot clearer than referring to the first and second entries of a list. I was wondering if there was a way to make it so that Point is also a numpy array. For example
In: p1 = Point(1,2)
In: p2 = Point(3,4)
In: (p1+p2).x
Out: 4
And similar nice functionality from numpy. In other words I think I want Point to be a subclass of numpy? Can I do this? And how?
A structured array like point_type
does not define math operations that involve several fields.
With the sample from https://stackoverflow.com/a/33455682/901925
In [470]: point_type = [('x', float), ('y', float)]
In [471]: points = np.array([(1,2), (3,4), (5,6)], dtype=point_type)
In [472]: points
Out[472]:
array([(1.0, 2.0), (3.0, 4.0), (5.0, 6.0)],
dtype=[('x', '<f8'), ('y', '<f8')])
In [473]: points[0]+points[1]
...
TypeError: unsupported operand type(s) for +: 'numpy.void' and 'numpy.void'
Instead I can create a 2d array, and then view it as point_type
- the databuffer layout will be the same:
In [479]: points = np.array([(1,2), (3,4), (5,6)],float)
In [480]: points
Out[480]:
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
In [481]: points.view(point_type)
Out[481]:
array([[(1.0, 2.0)],
[(3.0, 4.0)],
[(5.0, 6.0)]],
dtype=[('x', '<f8'), ('y', '<f8')])
In [482]: points.view(point_type).view(np.recarray).x
Out[482]:
array([[ 1.],
[ 3.],
[ 5.]])
I can do math across rows, and continue to view the results as points:
In [483]: (points[0]+points[1]).view(point_type).view(np.recarray)
Out[483]:
rec.array([(4.0, 6.0)],
dtype=[('x', '<f8'), ('y', '<f8')])
In [484]: _.x
Out[484]: array([ 4.])
In [485]: points.sum(0).view(point_type)
Out[485]:
array([(9.0, 12.0)],
dtype=[('x', '<f8'), ('y', '<f8')])
Alternatively I could start with the point_type
, and view it as 2d for the math, and then view-it-back
pdt1=np.dtype((float, (2,)))
In [502]: points
Out[502]:
array([(1.0, 2.0), (3.0, 4.0), (5.0, 6.0)],
dtype=[('x', '<f8'), ('y', '<f8')])
In [503]: points.view(pdt1)
Out[503]:
array([[ 1., 2.],
[ 3., 4.],
[ 5., 6.]])
In [504]: points.view(pdt1).sum(0).view(point_type)
Out[504]:
array([(9.0, 12.0)],
dtype=[('x', '<f8'), ('y', '<f8')])
So it is possible to view and operate on an array as 2d and as recarray. To be pretty or useful it probably needs to be burried in a user defined class.
Another option to crib ideas from the recarray
class. At its core it is just a structured array with a specialized __getattribute__
(and setattribute) method. That method first trys the normal array methods and attributes (e.g. x.shape
, x.sum
). Then it tries to fine attr
in the defined fieldnames.
def __getattribute__(self, attr):
try:
return object.__getattribute__(self, attr)
except AttributeError: # attr must be a fieldname
pass
fielddict = ndarray.__getattribute__(self, 'dtype').fields
try:
res = fielddict[attr][:2]
except (TypeError, KeyError):
raise AttributeError("record array has no attribute %s" % attr)
return self.getfield(*res)
...
points.view(np.recarray).x
becomes points.getfield(*points.dtype.fields['x'])
.
An alternate approach would be to borrow from namedtuple
(/usr/lib/python3.4/collections/__init__.py
), and define x
and y
properties, which would index the [:,0]
and [:,1]
columns of the 2d array.
It may be easiest to add those properties to a subclass of np.matrix
, letting that class ensure that most math results are 2d.
You can get somewhat similar functionality using numpy's structured arrays:
In [36]: import numpy as np
...: point_type = [('x', float), ('y', float)]
...: points = np.array([(1,2), (3,4), (5,6)], dtype=point_type)
In [37]: points[2]
Out[37]: (5.0, 6.0)
In [38]: points['x']
Out[38]: array([ 1., 3., 5.])
It is even possible to make the all the fields available using atribute access (e.g. using points.x
) by converting the structure array to a recarray:
In [39]: pts = points.view(np.recarray)
In [40]: pts['x']
Out[40]: array([ 1., 3., 5.])
In [41]: pts.x
Out[41]: array([ 1., 3., 5.])
In [42]: pts[2]
Out[42]: (5.0, 6.0)
Note that recarray apparently has some performance problems, and can be a bit annoying to use. You might also want to look into the pandas library, which also allows accessing fields by attribute, and which does not have the problems of recarray.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With