In Python 3, how can I check whether an object is a container (rather than an iterator that may allow only one pass)?
Here's an example:
def renormalize(cont):
'''
each value from the original container is scaled by the same factor
such that their total becomes 1.0
'''
total = sum(cont)
for v in cont:
yield v/total
list(renormalize(range(5))) # [0.0, 0.1, 0.2, 0.3, 0.4]
list(renormalize(k for k in range(5))) # [] - a bug!
Obviously, when the renormalize
function receives a generator expression, it does not work as intended. It assumes it can iterate through the container multiple times, while the generator allows only one pass through it.
Ideally, I'd like to do this:
def renormalize(cont):
if not is_container(cont):
raise ContainerExpectedException
# ...
How can I implement is_container
?
I suppose I could check if the argument is empty right as we're starting to do the second pass through it. But this approach doesn't work for more complicated functions where it's not obvious when exactly the second pass starts. Furthermore, I'd rather put the validation at the function entrance, rather than deep inside the function (and shift it around whenever the function is modified).
I can of course rewrite the renormalize
function to work correctly with a one-pass iterator. But that require copying the input data to a container. The performance impact of copying millions of large lists "just in case they are not lists" is ridiculous.
EDIT: My original example used a weighted_average
function:
def weighted_average(c):
'''
returns weighted average of a container c
c contains values and weights in tuples
weights don't need to sum up 1 (automatically renormalized)
'''
return sum((v * w for v, w in c)) / sum((w for v, w in c))
weighted_average([(0,1), (1,1)]) #0.5
weighted_average([(k, 1) for k in range(2)]) #0.5
weighted_average((k, 1) for k in range(2)) #mistake
But it was not the best example since the version of weighted_average
rewritten to use a single pass is arguably better anyway:
def weighted_average(it):
'''
returns weighted average of an iterator it
it yields values and weights in tuples
weights don't need to sum up 1 (automatically renormalized)
'''
total_value = 0
total_weight = 0
for v, w in it:
total_value += v
total_weight += w
return total_value / total_weight
Although all iterables should subclass collections.Iterable, not all of them do, unfortunately. Here is an answer based on what interface the objects implement, instead of what they "declare".
Short answer:
A "container" as you call it, ie a list/tuple that can be iterated over more than once as opposed to being a generator that will be exhausted, will typically implement both __iter__
and __getitem__
. Hence you can do this:
>>> def is_container_iterable(o):
... return hasattr(o, '__iter__') and hasattr(o, '__getitem__')
...
>>> is_container_iterable([])
True
>>> is_container_iterable(())
True
>>> is_container_iterable({})
True
>>> is_container_iterable(range(5))
True
>>> is_container_iterable(iter([]))
False
Long answer:
However, you can make an iterable that will not be exhausted and do not support getitem. For example, a function that generates prime-numbers. You could repeat the generation many times if you want, but having a function to retrieve the 1065th prime would take a lot of calculation, so you may not want to support that. :-)
So is there any more "reliable" way?
Well, all iterables will implement an __iter__
function that will return an iterator. The iterators will have a __next__
function. This is what is used when iterating over it. Calling __next__
repeatedly will in the end exhaust the iterator.
So if it has a __next__
function it is an iterator, and will be exhausted.
>>> def foo():
... for x in range(5):
... yield x
...
>>> f = foo()
>>> f.__next__
<method-wrapper '__next__' of generator object at 0xb73c02d4>
Iterables that are not yet iterators will not have a __next__
function, but will implement a __iter__
function, that will return an iterable:
>>> r = range(5)
>>> r.__next__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'range' object has no attribute '__next__'
>>> ri = iter(r)
>>> ri.__next__
<method-wrapper '__next__' of range_iterator object at 0xb73bef80>
So you can check that the object has __iter__
but that it does not have __next__
.
>>> def is_container_iterable(o):
... return hasattr(o, '__iter__') and not hasattr(o, '__next__')
...
>>> is_container_iterable(())
True
>>> is_container_iterable([])
True
>>> is_container_iterable({})
True
>>> is_container_iterable(range(5))
True
>>> is_container_iterable(iter(range(5)))
False
Iterators also has an __iter__
function, that will return self.
>>> iter(f) is f
True
>>> iter(r) is r
False
>>> iter(ri) is ri
True
Hence, you can do these variations of the checking:
>>> def is_container_iterable(o):
... return iter(o) is not o
...
>>> is_container_iterable([])
True
>>> is_container_iterable(())
True
>>> is_container_iterable({})
True
>>> is_container_iterable(range(5))
True
>>> is_container_iterable(iter([]))
False
That would fail if you implement an object that returns a broken iterator, one that does not return self when you call iter() on it again. But then your (or a third-party modules) code is actually doing things wrong.
It does depends on making an iterator though, and hence calling the objects __iter__
, which in theory may have side-effects, while the above hasattr calls should not have side effects. OK, so it calls getattribute which could have. But you can fix that thusly:
>>> def is_container_iterable(o):
... try:
... object.__getattribute__(o, '__iter__')
... except AttributeError:
... return False
... try:
... object.__getattribute__(o, '__next__')
... except AttributeError:
... return True
... return False
...
>>> is_container_iterable([])
True
>>> is_container_iterable(())
True
>>> is_container_iterable({})
True
>>> is_container_iterable(range(5))
True
>>> is_container_iterable(iter(range(5)))
False
This one is reasonably safe, and should work in all cases except if the object generates __next__
or __iter__
dynamically on __getattribute__
calls, but if you do that you are insane. :-)
Instinctively my preferred version would be iter(o) is o
, but I haven't ever needed to do this, so that's not based on experience.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With