Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding Python sum default start arg behavior

Tags:

python

sum

I am working with a Python object that implements __add__, but does not subclass int. MyObj1 + MyObj2 works fine, but sum([MyObj1, MyObj2]) led to a TypeError, becausesum() first attempts 0 + MyObj. In order to use sum(), my object needs __radd__ to handle MyObj + 0 or I need to provide an empty object as the start parameter. The object in question is not designed to be empty.

Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.

Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc contains one right-open interval, i.e. [start, end). Adding SimpleLoc yields a CompoundLoc, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12], checking length, and checking membership.

The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.

Questions I've looked at:

  • python's sum() and non-integer values
  • why there's a start argument in python's built-in sum function
  • TypeError after overriding the __add__ method

I'm considering two solutions. One is to avoid sum() and use the loop offered in this comment. I don't understand why sum() begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.

My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum() work.

# ...
def __radd__(self, other):
    # This allows sum() to work (the default start value is zero)
    if other == 0:
        return self
    return self.__add__(other)

In summary, is there another way to use sum() on objects that can neither be added to integers nor be empty?

like image 255
Lenna Avatar asked Jul 24 '12 05:07

Lenna


4 Answers

Instead of sum, use:

import operator
from functools import reduce
reduce(operator.add, seq)

in Python 2 reduce was built-in so this looks like:

import operator
reduce(operator.add, seq)

Reduce is generally more flexible than sum - you can provide any binary function, not only add, and you can optionally provide an initial element while sum always uses one.


Also note: (Warning: maths rant ahead)

Providing support for add w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.

Note that all of:

  • naturals
  • reals
  • complex numbers
  • N-d vectors
  • NxM matrices
  • strings

together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.

If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum.

In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +, are likely to expect that it will behave in a monoidic way (as addition normally does).


Thanks for expanding, I'll refer to your particular module now:

There are 2 concepts here:

  • Simple locations,
  • Compound locations.

It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.

OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.

If you agree with me (and the above matches your implementation), then you'll be able to use sum as following:

sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )

Indeed, this appears to work.


I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.

Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__, __iter__, __len__, perhaps __or__ as an alias of +, __and__ as the product, etc).

As for construction from xrange, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end) pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc or CompoundLoc if you feel it's going to help.

like image 77
Kos Avatar answered Nov 11 '22 11:11

Kos


I think that the best way to accomplish this is to provide the __radd__ method, or pass the start object to sum explicitly.

In case you really do not want to override __radd__ or provide a start object, how about redefining sum()?

>>> from __builtin__ import sum as builtin_sum
>>> def sum(iterable, startobj=MyCustomStartObject):
...     return builtin_sum(iterable, startobj)
... 

Preferably use a function with a name like my_sum(), but I guess that is one of the things you want to avoid (even though globally redefining builtin functions is probably something that a future maintainer will curse you for)

like image 42
Kimvais Avatar answered Nov 11 '22 10:11

Kimvais


Actually, implementing __add__ without the concept of an "empty object" makes little sense. sum needs a start parameter to support the sums of empty and one-element sequences, and you have to decide what result you expect in these cases:

sum([o1, o2]) => o1 + o2  # obviously
sum([o1]) => o1  # But how should __add__ be called here?  Not at all?
sum([]) => ?  # What now?
like image 3
Ferdinand Beyer Avatar answered Nov 11 '22 12:11

Ferdinand Beyer


You could use an object that's universally neutral wrt. addition:

class Neutral:
    def __add__(self, other):
        return other

print(sum("A BC D EFG".split(), Neutral())) # ABCDEFG
like image 2
Reinstate Monica Avatar answered Nov 11 '22 10:11

Reinstate Monica