Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Namedtuple in Numpy

I really like the functionally of the namedtuple collection. Specifically I like how useful it is for points in 2-dimensional space.

In : from collections import namedtuple

In : Point = namedtuple('Point', ['x', 'y'])

In : p = Point(1,2)

In : p.x
Out: 1

In : p.y
Out: 2

I think that's a lot clearer than referring to the first and second entries of a list. I was wondering if there was a way to make it so that Point is also a numpy array. For example

 In: p1 = Point(1,2)
 In: p2 = Point(3,4)
 In: (p1+p2).x 
 Out: 4

And similar nice functionality from numpy. In other words I think I want Point to be a subclass of numpy? Can I do this? And how?

like image 301
Ben Avatar asked Oct 30 '15 03:10

Ben


2 Answers

A structured array like point_type does not define math operations that involve several fields.

With the sample from https://stackoverflow.com/a/33455682/901925

In [470]: point_type = [('x', float), ('y', float)]
In [471]: points = np.array([(1,2), (3,4), (5,6)], dtype=point_type)
In [472]: points
Out[472]: 
array([(1.0, 2.0), (3.0, 4.0), (5.0, 6.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])
In [473]: points[0]+points[1]
...
TypeError: unsupported operand type(s) for +: 'numpy.void' and 'numpy.void'

Instead I can create a 2d array, and then view it as point_type - the databuffer layout will be the same:

In [479]: points = np.array([(1,2), (3,4), (5,6)],float)
In [480]: points
Out[480]: 
array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])
In [481]: points.view(point_type)
Out[481]: 
array([[(1.0, 2.0)],
       [(3.0, 4.0)],
       [(5.0, 6.0)]], 
      dtype=[('x', '<f8'), ('y', '<f8')])
In [482]: points.view(point_type).view(np.recarray).x
Out[482]: 
array([[ 1.],
       [ 3.],
       [ 5.]])

I can do math across rows, and continue to view the results as points:

In [483]: (points[0]+points[1]).view(point_type).view(np.recarray)
Out[483]: 
rec.array([(4.0, 6.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])
In [484]: _.x
Out[484]: array([ 4.])
In [485]: points.sum(0).view(point_type)
Out[485]: 
array([(9.0, 12.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])

Alternatively I could start with the point_type, and view it as 2d for the math, and then view-it-back

pdt1=np.dtype((float, (2,)))
In [502]: points
Out[502]: 
array([(1.0, 2.0), (3.0, 4.0), (5.0, 6.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])
In [503]: points.view(pdt1)
Out[503]: 
array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])
In [504]: points.view(pdt1).sum(0).view(point_type)
Out[504]: 
array([(9.0, 12.0)], 
      dtype=[('x', '<f8'), ('y', '<f8')])

So it is possible to view and operate on an array as 2d and as recarray. To be pretty or useful it probably needs to be burried in a user defined class.

Another option to crib ideas from the recarray class. At its core it is just a structured array with a specialized __getattribute__ (and setattribute) method. That method first trys the normal array methods and attributes (e.g. x.shape, x.sum). Then it tries to fine attr in the defined fieldnames.

def __getattribute__(self, attr):
    try:
        return object.__getattribute__(self, attr)
    except AttributeError: # attr must be a fieldname
        pass
    fielddict = ndarray.__getattribute__(self, 'dtype').fields
    try:
        res = fielddict[attr][:2]
    except (TypeError, KeyError):
        raise AttributeError("record array has no attribute %s" % attr)
    return self.getfield(*res)
    ...

points.view(np.recarray).x becomes points.getfield(*points.dtype.fields['x']).

An alternate approach would be to borrow from namedtuple (/usr/lib/python3.4/collections/__init__.py), and define x and y properties, which would index the [:,0] and [:,1] columns of the 2d array. It may be easiest to add those properties to a subclass of np.matrix, letting that class ensure that most math results are 2d.

like image 184
hpaulj Avatar answered Sep 30 '22 15:09

hpaulj


You can get somewhat similar functionality using numpy's structured arrays:

In [36]: import numpy as np
    ...: point_type = [('x', float), ('y', float)]
    ...: points = np.array([(1,2), (3,4), (5,6)], dtype=point_type)

In [37]: points[2]
Out[37]: (5.0, 6.0)

In [38]: points['x']
Out[38]: array([ 1.,  3.,  5.])

It is even possible to make the all the fields available using atribute access (e.g. using points.x) by converting the structure array to a recarray:

In [39]: pts = points.view(np.recarray)

In [40]: pts['x']
Out[40]: array([ 1.,  3.,  5.])

In [41]: pts.x
Out[41]: array([ 1.,  3.,  5.])

In [42]: pts[2]
Out[42]: (5.0, 6.0)

Note that recarray apparently has some performance problems, and can be a bit annoying to use. You might also want to look into the pandas library, which also allows accessing fields by attribute, and which does not have the problems of recarray.

like image 24
Bas Swinckels Avatar answered Sep 30 '22 15:09

Bas Swinckels