I was reading Flatten (an irregular) list of lists and decided to adopt it as a Python exercise - a small function I'll occasionally rewrite without referring to the original, just for practice. The first time I tried this, I had something like the following:
def flat(iterable):
try:
iter(iterable)
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
This works fine for basic structures like nested list
s containing numbers, but strings crash it because the first element of a string is a single-character string, the first element of which is itself, the first element of which is itself again, and so on. Checking the question linked above, I realized that that explains the check for strings. That gave me the following:
def flatter(iterable):
try:
iter(iterable)
if isinstance(iterable, str):
raise TypeError
except TypeError:
yield iterable
else:
for item in iterable:
yield from flatten(item)
Now it works for strings as well. However, I then recalled that a list
can contain references to itself.
>>> lst = []
>>> lst.append(lst)
>>> lst
[[...]]
>>> lst[0][0][0][0] is lst
True
So, a string isn't the only type that could cause this sort of problem. At this point, I started looking for a way to guard against this issue without explicit type-checking.
The following flattener.py
ensued. flattish()
is a version that just checks for strings. flatten_notype()
checks whether an object's first item's first item is equal to itself to determine recursion. flatten()
does this and then checks whether either the object or its first item's first item is an instance of the other's type. The Fake
class basically just defines a wrapper for sequences. The comments on the lines that test each function describe the results, in the form should be `desired_result` [> `undesired_actual_result`]
. As you can see, each fails in various ways on Fake
wrapped around a string, Fake
wrapped around a list
of integers, single-character strings, and multiple-character strings.
def flattish(*i):
for item in i:
try: iter(item)
except: yield item
else:
if isinstance(item, str): yield item
else: yield from flattish(*item)
class Fake:
def __init__(self, l):
self.l = l
self.index = 0
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.l):
raise StopIteration
else:
self.index +=1
return self.l[self.index-1]
def __str__(self):
return str(self.l)
def flatten_notype(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item
else:
yield from flatten(*item)
except TypeError:
yield item
def flatten(*i):
for item in i:
try:
n = next(iter(item))
try:
n2 = next(iter(n))
recur = n == n2
except TypeError:
yield from flatten(*item)
else:
if recur:
yield item if isinstance(n2, type(item)) or isinstance(item, type(n2)) else n2
else:
yield from flatten(*item)
except TypeError:
yield item
f = Fake('abc')
print(*flattish(f)) # should be `abc`
print(*flattish((f,))) # should be `abc` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flattish(f)) # should be `1 2 3`
print(*flattish((f,))) # should be `1 2 3` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake('abc')
print(*flatten_notype(f)) # should be `abc`
print(*flatten_notype((f,))) # should be `abc` > `c`
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake([1, 2, 3])
print(*flatten_notype(f)) # should be `1 2 3` > `2 3`
print(*flatten_notype((f,))) # should be `1 2 3` > ``
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`
f = Fake('abc')
print(*flatten(f)) # should be `abc` > `a`
print(*flatten((f,))) # should be `abc` > `c`
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
f = Fake([1, 2, 3])
print(*flatten(f)) # should be `1 2 3` > `2 3`
print(*flatten((f,))) # should be `1 2 3` > ``
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`
I've also tried the following with the recursive lst
defined above and flatten()
:
>>> print(*flatten(lst))
[[...]]
>>> lst.append(0)
>>> print(*flatten(lst))
[[...], 0]
>>> print(*list(flatten(lst))[0])
[[...], 0] 0
As you can see, it fails similarly to 1 ('a',) bc
as well as in its own special way.
I read how can python function access its own attributes? thinking that maybe the function could keep track of every object it had seen, but that wouldn't work either because our lst
contains an object with matching identity and equality, strings contain objects that may only have matching equality, and equality isn't enough due to the possibility of something like flatten([1, 2], [1, 2])
.
Is there any reliable way (i.e. doesn't simply check known types, doesn't require that a recursive container and its containers all be of the same type, etc.) to check whether a container holds iterable objects with potential infinite recursion, and reliably determine the smallest unique container? If there is, please explain how it can be done, why it is reliable, and how it handles various recursive circumstances. If not, please explain why this is logically impossible.
I don't think there's a reliable way to find out if an arbitrary iterable is infinite. The best we can is to yield primitives infinitely from such an iterable without exhausting the stack, for example:
from collections import deque
def flat(iterable):
d = deque([iterable])
def _primitive(x):
return type(x) in (int, float, bool, str, unicode)
def _next():
x = d.popleft()
if _primitive(x):
return True, x
d.extend(x)
return False, None
while d:
ok, x = _next()
if ok:
yield x
xs = [1,[2], 'abc']
xs.insert(0, xs)
for p in flat(xs):
print p
The above definition of "primitive" is, well, primitive, but that surely can be improved.
The scenario you ask about is very loosely defined. As defined in your question, it is logically impossible "to check whether a container holds iterable objects with potential infinite recursion[.]" The only limit on the scope of your question is "iterable" object. The official Python documentation defines "iterable" as follows:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an
__iter__()
or__getitem__()
method. [...]
The key phrase here is "any classes [defined] with an __iter__()
or __getitem__()
method." This allows for "iterable" objects with members that are generated on demand. For example, suppose that someone seeks to use a bunch of string objects that automatically sort and compare in chronological order based on the time at which the particular string was created. They either subclass str or reimplement its functionality, adding a timestamp associated with each pointer to a timestampedString( )
object, and adjust the comparison methods accordingly.
Accessing a substring by index location is a way of creating a new string, so a timestampedString( )
of len( ) == 1
could legitimately return a timestampedString( )
of len( ) == 1
with the same character but a new timestamp when you access timestampedString( )[0:1]
. Because the timestamp is part of the specific object instance, there is no kind of identity test that would say that the two objects are the same unless any two strings consisting of the same character are considered to be the same. You state in your question that this should not be the case.
To detect infinite recursion, you first need to add a constraint to the scope of your question that the container only contain static, i.e. pre-generated, objects. With this constraint, any legal object in the container can be converted to some byte-string representation of the object. A simple way to do this would be to pickle each object in the container as you reach it, and maintain a stack of the byte-string representations that result from pickling. If you allow any arbitrary static object, nothing less than a raw-byte interpretation of the objects is going to work.
However, algorithmically enforcing the constraint that the container only contain static objects presents another problem: it requires type-checking against some pre-approved list of types such as some notion of primitives. Two categories of objects can then be accommodated: single objects of a known-static type (e.g. primitives) and containers for which the number of contained items can be determined in advance. The latter category can then be shown to be finite when that many contained objects have been iterated through and all have been shown to be finite. Containers within the container can be handled recursively. The known-static type single objects are the recursive base-case.
If the container produces more objects, then it violates the definition of this category of object. The problem with allowing arbitrary objects in Python is that these objects can be defined in Python code that can use components written in C code and any other language that C can be linked to. There is no way to evaluate this code to determine if it actually complies with the static requirement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With