Namedtuple in Numpy

Question

I really like the functionally of the namedtuple collection. Specifically I like how useful it is for points in 2-dimensional space.

In : from collections import namedtuple

In : Point = namedtuple('Point', ['x', 'y'])

In : p = Point(1,2)

In : p.x
Out: 1

In : p.y
Out: 2

I think that's a lot clearer than referring to the first and second entries of a list. I was wondering if there was a way to make it so that Point is also a numpy array. For example

 In: p1 = Point(1,2)
 In: p2 = Point(3,4)
 In: (p1+p2).x 
 Out: 4

And similar nice functionality from numpy. In other words I think I want Point to be a subclass of numpy? Can I do this? And how?

hpaulj · Accepted Answer

A structured array like point_type does not define math operations that involve several fields.

With the sample from https://stackoverflow.com/a/33455682/901925

In [470]: point_type = [('x', float), ('y', float)]
In [471]: points = np.array([(1,2), (3,4), (5,6)], dtype=point_type)
In [472]: points
Out[472]: 
array([(1.0, 2.0), (3.0, 4.0), (5.0, 6.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])
In [473]: points[0]+points[1]
...
TypeError: unsupported operand type(s) for +: 'numpy.void' and 'numpy.void'

Instead I can create a 2d array, and then view it as point_type - the databuffer layout will be the same:

In [479]: points = np.array([(1,2), (3,4), (5,6)],float)
In [480]: points
Out[480]: 
array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])
In [481]: points.view(point_type)
Out[481]: 
array([[(1.0, 2.0)],
       [(3.0, 4.0)],
       [(5.0, 6.0)]], 
      dtype=[('x', '<f8'), ('y', '<f8')])
In [482]: points.view(point_type).view(np.recarray).x
Out[482]: 
array([[ 1.],
       [ 3.],
       [ 5.]])

I can do math across rows, and continue to view the results as points:

In [483]: (points[0]+points[1]).view(point_type).view(np.recarray)
Out[483]: 
rec.array([(4.0, 6.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])
In [484]: _.x
Out[484]: array([ 4.])
In [485]: points.sum(0).view(point_type)
Out[485]: 
array([(9.0, 12.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])

Alternatively I could start with the point_type, and view it as 2d for the math, and then view-it-back

pdt1=np.dtype((float, (2,)))
In [502]: points
Out[502]: 
array([(1.0, 2.0), (3.0, 4.0), (5.0, 6.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])
In [503]: points.view(pdt1)
Out[503]: 
array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])
In [504]: points.view(pdt1).sum(0).view(point_type)
Out[504]: 
array([(9.0, 12.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])

So it is possible to view and operate on an array as 2d and as recarray. To be pretty or useful it probably needs to be burried in a user defined class.

Another option to crib ideas from the recarray class. At its core it is just a structured array with a specialized __getattribute__ (and setattribute) method. That method first trys the normal array methods and attributes (e.g. x.shape, x.sum). Then it tries to fine attr in the defined fieldnames.

def __getattribute__(self, attr):
    try:
        return object.__getattribute__(self, attr)
    except AttributeError: # attr must be a fieldname
        pass
    fielddict = ndarray.__getattribute__(self, 'dtype').fields
    try:
        res = fielddict[attr][:2]
    except (TypeError, KeyError):
        raise AttributeError("record array has no attribute %s" % attr)
    return self.getfield(*res)
    ...

points.view(np.recarray).x becomes points.getfield(*points.dtype.fields['x']).

An alternate approach would be to borrow from namedtuple (/usr/lib/python3.4/collections/__init__.py), and define x and y properties, which would index the [:,0] and [:,1] columns of the 2d array. It may be easiest to add those properties to a subclass of np.matrix, letting that class ensure that most math results are 2d.

Bas Swinckels · Answer

You can get somewhat similar functionality using numpy's structured arrays:

In [36]: import numpy as np
    ...: point_type = [('x', float), ('y', float)]
    ...: points = np.array([(1,2), (3,4), (5,6)], dtype=point_type)

In [37]: points[2]
Out[37]: (5.0, 6.0)

In [38]: points['x']
Out[38]: array([ 1.,  3.,  5.])

It is even possible to make the all the fields available using atribute access (e.g. using points.x) by converting the structure array to a recarray:

In [39]: pts = points.view(np.recarray)

In [40]: pts['x']
Out[40]: array([ 1.,  3.,  5.])

In [41]: pts.x
Out[41]: array([ 1.,  3.,  5.])

In [42]: pts[2]
Out[42]: (5.0, 6.0)

Note that recarray apparently has some performance problems, and can be a bit annoying to use. You might also want to look into the pandas library, which also allows accessing fields by attribute, and which does not have the problems of recarray.

Namedtuple in Numpy

Tags:

python

oop

namedtuple

numpy

Ben

2 Answers

hpaulj

Bas Swinckels

Recent Activity

Donate For Us

Namedtuple in Numpy

Tags:

python

oop

namedtuple

numpy

Ben

2 Answers

hpaulj

Bas Swinckels

Related questions

Recent Activity

Donate For Us