Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to check if an iterable allows more than one pass?

In Python 3, how can I check whether an object is a container (rather than an iterator that may allow only one pass)?

Here's an example:

def renormalize(cont):
    '''
    each value from the original container is scaled by the same factor
    such that their total becomes 1.0
    '''
    total = sum(cont)
    for v in cont:
        yield v/total

list(renormalize(range(5))) # [0.0, 0.1, 0.2, 0.3, 0.4]
list(renormalize(k for k in range(5))) # [] - a bug!

Obviously, when the renormalize function receives a generator expression, it does not work as intended. It assumes it can iterate through the container multiple times, while the generator allows only one pass through it.

Ideally, I'd like to do this:

def renormalize(cont):
    if not is_container(cont):
      raise ContainerExpectedException
    # ...

How can I implement is_container?

I suppose I could check if the argument is empty right as we're starting to do the second pass through it. But this approach doesn't work for more complicated functions where it's not obvious when exactly the second pass starts. Furthermore, I'd rather put the validation at the function entrance, rather than deep inside the function (and shift it around whenever the function is modified).

I can of course rewrite the renormalize function to work correctly with a one-pass iterator. But that require copying the input data to a container. The performance impact of copying millions of large lists "just in case they are not lists" is ridiculous.

EDIT: My original example used a weighted_average function:

def weighted_average(c):
    '''
    returns weighted average of a container c
    c contains values and weights in tuples
    weights don't need to sum up 1 (automatically renormalized)
    '''
    return sum((v * w for v, w in c)) / sum((w for v, w in c))

weighted_average([(0,1), (1,1)]) #0.5 
weighted_average([(k, 1) for k in range(2)]) #0.5
weighted_average((k, 1) for k in range(2)) #mistake

But it was not the best example since the version of weighted_average rewritten to use a single pass is arguably better anyway:

def weighted_average(it):
    '''
    returns weighted average of an iterator it
    it yields values and weights in tuples
    weights don't need to sum up 1 (automatically renormalized)
    '''
    total_value = 0
    total_weight = 0
    for v, w in it:
        total_value += v
        total_weight += w
    return total_value / total_weight
like image 566
max Avatar asked Jan 24 '12 20:01

max


1 Answers

Although all iterables should subclass collections.Iterable, not all of them do, unfortunately. Here is an answer based on what interface the objects implement, instead of what they "declare".

Short answer:

A "container" as you call it, ie a list/tuple that can be iterated over more than once as opposed to being a generator that will be exhausted, will typically implement both __iter__ and __getitem__. Hence you can do this:

>>> def is_container_iterable(o):
...     return hasattr(o, '__iter__') and hasattr(o, '__getitem__')
... 
>>> is_container_iterable([])
True
>>> is_container_iterable(())
True
>>> is_container_iterable({})
True
>>> is_container_iterable(range(5))
True
>>> is_container_iterable(iter([]))
False

Long answer:

However, you can make an iterable that will not be exhausted and do not support getitem. For example, a function that generates prime-numbers. You could repeat the generation many times if you want, but having a function to retrieve the 1065th prime would take a lot of calculation, so you may not want to support that. :-)

So is there any more "reliable" way?

Well, all iterables will implement an __iter__ function that will return an iterator. The iterators will have a __next__ function. This is what is used when iterating over it. Calling __next__ repeatedly will in the end exhaust the iterator.

So if it has a __next__ function it is an iterator, and will be exhausted.

>>> def foo():
...    for x in range(5):
...        yield x
... 
>>> f = foo()
>>> f.__next__
<method-wrapper '__next__' of generator object at 0xb73c02d4>

Iterables that are not yet iterators will not have a __next__ function, but will implement a __iter__ function, that will return an iterable:

>>> r = range(5)
>>> r.__next__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'range' object has no attribute '__next__'
>>> ri = iter(r)
>>> ri.__next__
<method-wrapper '__next__' of range_iterator object at 0xb73bef80>

So you can check that the object has __iter__ but that it does not have __next__.

>>> def is_container_iterable(o):
...     return hasattr(o, '__iter__') and not hasattr(o, '__next__')
... 
>>> is_container_iterable(())
True
>>> is_container_iterable([])
True
>>> is_container_iterable({})
True
>>> is_container_iterable(range(5))
True
>>> is_container_iterable(iter(range(5)))
False

Iterators also has an __iter__ function, that will return self.

>>> iter(f) is f
True
>>> iter(r) is r
False
>>> iter(ri) is ri
True

Hence, you can do these variations of the checking:

>>> def is_container_iterable(o):
...     return iter(o) is not o
... 
>>> is_container_iterable([])
True
>>> is_container_iterable(())
True
>>> is_container_iterable({})
True
>>> is_container_iterable(range(5))
True
>>> is_container_iterable(iter([]))
False

That would fail if you implement an object that returns a broken iterator, one that does not return self when you call iter() on it again. But then your (or a third-party modules) code is actually doing things wrong.

It does depends on making an iterator though, and hence calling the objects __iter__, which in theory may have side-effects, while the above hasattr calls should not have side effects. OK, so it calls getattribute which could have. But you can fix that thusly:

>>> def is_container_iterable(o):
...     try:
...         object.__getattribute__(o, '__iter__')
...     except AttributeError:
...         return False
...     try:
...         object.__getattribute__(o, '__next__')
...     except AttributeError:
...         return True
...     return False
... 
>>> is_container_iterable([])
True
>>> is_container_iterable(())
True
>>> is_container_iterable({})
True
>>> is_container_iterable(range(5))
True
>>> is_container_iterable(iter(range(5)))
False

This one is reasonably safe, and should work in all cases except if the object generates __next__ or __iter__ dynamically on __getattribute__ calls, but if you do that you are insane. :-)

Instinctively my preferred version would be iter(o) is o, but I haven't ever needed to do this, so that's not based on experience.

like image 75
Lennart Regebro Avatar answered Sep 27 '22 23:09

Lennart Regebro