Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic way to Implement Data Types (Python 2.7)

Tags:

python

types

oop

The majority of my programming experience has been with C++. Inspired by Bjarne Stroustrup's talk here, one of my favorite programming techniques is "type-rich" programming; the development of new robust data-types that will not only reduce the amount of code I have to write by wrapping functionality into the type (for example vector addition, instead of newVec.x = vec1.x + vec2.x; newVec.y = ... etc, we can just use newVec = vec1 + vec2) but will also reveal problems in your code at compile time through the strong type system.

A recent project I have undertaken in Python 2.7 requires integer values that have upper and lower bounds. My first instinct is to create a new data type (class) that will have all the same behavior as a normal number in python, but will always be within its (dynamic) boundary values.

class BoundInt:
    def __init__(self, target = 0, low = 0, high = 1):
        self.lowerLimit = low
        self.upperLimit = high
        self._value = target
        self._balance()

    def _balance(self):
        if (self._value > self.upperLimit):
            self._value = self.upperLimit
        elif (self._value < self.lowerLimit):
            self._value = self.lowerLimit
        self._value = int(round(self._value))

    def value(self):
        self._balance()
        return self._value

    def set(self, target):
        self._value = target
        self._balance()

    def __str__(self):
        return str(self._value)

This is a good start, but it requires accessing the meat of these BoundInt types like so

x = BoundInt()
y = 4
x.set(y)           #it would be nicer to do something like x = y
print y            #prints "4"
print x            #prints "1"
z = 2 + x.value()  #again, it would be nicer to do z = 2 + x
print z            #prints "3" 

We can add a large number of python's "magic method" definitions to the class to add some more functionality:

def __add__(self, other):
    return self._value + other

def __sub__(self, other):
    return self._value - other

def __mul__(self, other):
    return self._value * other

def __div__(self, other):
    return self._value / other

def __pow__(self, power):
    return self._value**power

def __radd__(self, other):
    return self._value + other

#etc etc

Now the code is rapidly exploding in size, and there is a ton of repetition to what is being written, for very little return, this doesn't seem very pythonic at all.

Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects

x = BoundInt()
y = BoundInt(x)
z = BoundInt(4)

Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.

All of this feels terribly like trying to write c++ code in python, a cardinal sin if one of my favorite books, Code Complete 2, is taken seriously. I feel like I am swimming against the dynamic typing current, instead of letting it carry me forward.

I very much want to learn to code python 'pythonic-ally', what is the best way to approach this sort of problem domain? What are good resources to learn proper pythonic style?

like image 545
Sam Coulter Avatar asked Nov 06 '12 00:11

Sam Coulter


3 Answers

There's plenty of code in the standard library, in popular PyPI modules, and in ActiveState recipes that does this kind of thing, so you're probably better off reading examples than trying to figure it out from first principles. Also, note that this is pretty similar to creating a list-like or dict-like class, which there are even more examples of.

However, there are some answers to what you want to do. I'll start with the most serious, then work backward.

Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects … Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.

Ah, but think about what you're doing: You're constructing a BoundInt from anything that can act like an integer, including, say, an actual int or a BoundInt, right? So, why not:

def __init__(self, target, low, high):
    self.target, self.low, self.high = int(target), int(low), int(high)

I'm assuming you've already added a __int__ method to BoundInt, of course (the equivalent of a C++ explicit operator int() const).

Also, keep in mind that the lack of overloading isn't as serious as you're thinking coming from C++, because there is no "copy constructor" for making copies; you just pass the object around, and all that gets taken care of under the covers.

For example, imagine this C++ code:

BoundInt foo(BoundInt param) { BoundInt local = param; return local; }
BoundInt bar;
BoundInt baz = foo(bar);

This copies bar to param, param to local, local to an unnamed "return value" variable, and that to baz. Some of these will be optimized out, and others (in C++11) will use move instead of copy, but still, you've got 4 conceptual invocations of the copy/move constructors/assignment operators.

Now look at the Python equivalent:

def foo(param): local = param; return local
bar = BoundInt();
baz = foo(bar)

Here, we've just got one BoundInt instance—the one that was explicitly created—and all we're doing is binding new names to it. Even assigning baz as a member of a new object that outlives the scope of bar and baz won't make a copy. The only thing that makes a copy is explicitly calling BoundInt(baz) again. (This isn't quite 100% true, because someone can always inspect your object and attempt to clone it from the outside, and pickle, deepcopy, etc. may actually do so… but in that case, they're still not calling a "copy constructor" that you or the compiler wrote.)

Now, what about forwarding all those operators to the value?

Well, one possibility is to do it dynamically. The details depend on whether you're in Python 3 or 2 (and, for 2, how far back you need to support). But the idea is you just have a list of names, and for each one, you define a method with that name that calls the method of the same name on the value object. If you want a sketch of this, provide the extra info and ask, but you're probably better off looking for examples of dynamic method creation.

So, is that Pythonic? Well, it depends.

If you're creating dozens of "integer-like" classes, then yes, it's certainly better than copy-paste code or adding a "compile-time" generation step, and it's probably better than adding an otherwise-unnecessary base class.

And if you're trying to work across many versions of Python and don't want to have to remember "which version am I supposed to stop supplying __cmp__ to act like int again?" type questions, I might go even further and get the list of methods out of int itself (take dir(int()) and blacklist out a few names).

But if you're just doing this one class, in, say, just Python 2.6-2.7 or just 3.3+, I think it's a toss-up.

A good class to read is the fractions.Fraction class in the standard library. It's clearly-written pure Python code. And it partially demonstrates both the dynamic and explicit mechanisms (because it explicitly defines each special message in terms of generic dynamic forwarding functions), and if you've got both 2.x and 3.x around you can compare and contrast the two.

Meanwhile, it seems like your class is underspecified. If x is a BoundInt and y is an int, should x+y really return an int (as it does in your code)? If not, do you need to bound it? What about y+x? What should x+=y do? And so on.

Finally, in Python, it's often worth making "value classes" like this immutable, even if the intuitive C++ equivalent would be mutable. For example, consider this:

>>> i = BoundInt(3, 0, 10)
>>> j = i
>>> i.set(5)
>>> j
5

I don't think you'd expect this. This wouldn't happen in C++ (for a typical value class), because j = i would create a new copy, but in Python, it's just binding a new name to the same copy. (It's equivalent to BoundInt &j = i, not BoundInt j = i.)

If you want BoundInt to be immutable, besides eliminating obvious things like set, also make sure not to implement __iadd__ and friends. If you leave out __iadd__, i += 2 will be turned into i = i.__add__(2): in other words, it will create a new instance, then rebind i to that new instance, leaving the old one alone.

like image 100
abarnert Avatar answered Sep 25 '22 00:09

abarnert


There are likely many opinions on this. But regarding the proliferation of of special methods, you will just have to do that to make it complete. But at least you only do that once, in one place. Also the built-in number types can be subclassed. That's what I did for a similar implementation, that you can look it.

like image 29
Keith Avatar answered Sep 24 '22 00:09

Keith


Your set method is an abomination. You do not create a number with a default value of zero and then change the number into some other number. That is very much trying to program C++ in Python, and will cause you endless amounts of headaches if you actually want to treat these the same way you do numbers, because every time you pass them to functions they are passed by reference (like everything in Python). So you'll end up with large amounts of aliasing in things you think you can treat like numbers, and you will almost certainly encounter bugs due to mutating the value of numbers you don't realise are aliased, or expecting to be able to retrieve a value stored in a dictionary with a BoundInt as a key by providing another BoundInt with the same value.

To me, high and low aren't data values associated with a particular BoundInt value, they're type parameters. I want a number 7 in the type BoundInt(1, 10), not a number 7 which is constrained to be between 1 and 10, all of which being a value in the type BoundInt.

If I really wanted to do something like this, the approach I would take would be to subclass intis to treat BoundInt as a class factory; you give it a range, and it gives you the type of integers restricted to be in that range. You can apply that type to any "int-like" object and it will give you a value clamped to that range. Something like:

_bound_int_cache = {}
def BoundInt(low, low):
    try:
        return _bound_int_cache[(low, high)]
    except KeyError:
        class Tmp(int):
            low = low
            high = high
            def __new__(cls, value):
                value = max(value, cls.low)
                value = min(value, cls.max)
                return int.__new__(cls, value)

        Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
        _bound_int_cache[(low, high)] = Tmp
        return _bound_int_cache[(low, high)]

(The cache is just to ensure that two different attempts to get the BoundInt type for the same low/high values give you the exact same class, not two different classes that behave the same way. Probably wouldn't matter in practice most of the time, but it seems nicer.)

You would use this like:

B = BoundInt(1, 10)
x = B(7)

The "class factory" approach means that if you have a small number of meaningful ranges in which you want to bound your integers, you can create the classes for those ranges globally (with meaningful names), and then use them exactly like regular classes.

Subclassing int makes these objects immutable (which is why the initialisation had to be done in __new__), which frees you from aliasing bugs (which people don't expect to have to worry about when they're programming with simple value types like numbers, and for good reason). It also gives you all the integer methods for free, and so these BoundInt types behave exactly as int, except that when you create one the value is clamped by the type. Unfortunately that means that all operations on these types return int objects, not BoundInt objects.

If you could come up with a way of reconciling the low/high values for the two different values involved in e.g. x + y, then you can override the special methods to make them return BoundInt values. The approaches that spring to mind are:

  1. Take the left operand's bounds and ignore the right operand (seems messy and asymmetrical; violates the assumption that x + y = y + x)
  2. Take the maximum low value and the minimum high value. It's nicely symmetrical, and you can treat numeric values that don't have low and high values as if they were sys.minint and sys.maxint (i.e. just use the bounds from the other value). Doesn't make a whole lot of sense if the ranges don't overlap at all, because you'll end up with an empty range, but operating on such numbers together probably doesn't make a whole lot of sense anyway.
  3. Take the minimum low value and the maximum high value. Also symmetrical, but here you probably want to explicitly ignore normal numbers rather than pretending they're BoundInt values that can range over the whole integer range.

Any of the above could work, and any of the above will probably surprise you at some point (e.g. negating a number constrained to be in a positive range will always give you the smallest positive number in the range, which seems weird to me).

If you take this approach, you probably don't want to subclass int. Because if you have normalInt + boundedInt, then normalInt would handle the addition without respecting your code. You instead want it not to recognose boundedInt as an int value, so that int's __add__ wont' work and will give your class a chance to try __radd__. But I would still treat your class as "immutable", and make every operation that comes up with a new number construct a new object; mutating numbers in place is practically guaranteed to cause bugs sometime.

So I'd handle that approach something like this:

class BoundIntBase(object):
    # Don't use this class directly; use a subclass that specifies low and high as
    # class attributes.
    def __init__(self, value):
        self.value = min(self.high, max(self.low, int(value)))

    def __int__(self):
        return self.value

# add binary operations to BoundInt
for method in ['__add__', '__radd__', ...]:
    def tmp(self, other):
        try:
            low = min(self.low, other.low)
            high = max(self.high, other.high)
        except AttributError:
            cls = type(self)
        else:
            cls = BountInd(low, high)
        v = getattr(int(self), method)(int(other))
        return cls(v)
    tmp.__name__ = method
    setattr(BountIntBase, method, tmp)


_bound_int_cache = {}
def BoundInt(low, low):
    try:
        return _bound_int_cache[(low, high)]
    except KeyError:
        class Tmp(int):
            low = low
            high = high
            def __new__(cls, value):
                value = max(value, cls.low)
                value = min(value, cls.max)
                return int.__new__(cls, value)

        Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
        _bound_int_cache[(low, high)] = Tmp
        return _bound_int_cache[(low, high)]

Still seems like more code than it should be, but what you're trying to do is actually more complicated than you think it is.

like image 29
Ben Avatar answered Sep 25 '22 00:09

Ben