Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I determine whether a container is infinitely recursive and find its smallest unique container?

I was reading Flatten (an irregular) list of lists and decided to adopt it as a Python exercise - a small function I'll occasionally rewrite without referring to the original, just for practice. The first time I tried this, I had something like the following:

def flat(iterable):
    try:
        iter(iterable)
    except TypeError:
        yield iterable
    else:
        for item in iterable:
            yield from flatten(item)

This works fine for basic structures like nested lists containing numbers, but strings crash it because the first element of a string is a single-character string, the first element of which is itself, the first element of which is itself again, and so on. Checking the question linked above, I realized that that explains the check for strings. That gave me the following:

def flatter(iterable):
    try:
        iter(iterable)
        if isinstance(iterable, str):
            raise TypeError
    except TypeError:
        yield iterable
    else:
        for item in iterable:
            yield from flatten(item)

Now it works for strings as well. However, I then recalled that a list can contain references to itself.

>>> lst = []
>>> lst.append(lst)
>>> lst
[[...]]
>>> lst[0][0][0][0] is lst
True

So, a string isn't the only type that could cause this sort of problem. At this point, I started looking for a way to guard against this issue without explicit type-checking.

The following flattener.py ensued. flattish() is a version that just checks for strings. flatten_notype() checks whether an object's first item's first item is equal to itself to determine recursion. flatten() does this and then checks whether either the object or its first item's first item is an instance of the other's type. The Fake class basically just defines a wrapper for sequences. The comments on the lines that test each function describe the results, in the form should be `desired_result` [> `undesired_actual_result`]. As you can see, each fails in various ways on Fake wrapped around a string, Fake wrapped around a list of integers, single-character strings, and multiple-character strings.

def flattish(*i):
    for item in i:
        try: iter(item)
        except: yield item
        else:
            if isinstance(item, str): yield item
            else: yield from flattish(*item)

class Fake:
    def __init__(self, l):
        self.l = l
        self.index = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.index >= len(self.l):
            raise StopIteration
        else:
            self.index +=1
            return self.l[self.index-1]
    def __str__(self):
        return str(self.l)

def flatten_notype(*i):
    for item in i:
        try:
            n = next(iter(item))
            try:
                n2 = next(iter(n))
                recur = n == n2
            except TypeError:
                yield from flatten(*item)
            else:
                if recur:
                    yield item
                else:
                    yield from flatten(*item)
        except TypeError:
            yield item

def flatten(*i):
    for item in i:
        try:
            n = next(iter(item))
            try:
                n2 = next(iter(n))
                recur = n == n2
            except TypeError:
                yield from flatten(*item)
            else:
                if recur:
                    yield item if isinstance(n2, type(item)) or isinstance(item, type(n2)) else n2
                else:
                    yield from flatten(*item)
        except TypeError:
            yield item


f = Fake('abc')

print(*flattish(f)) # should be `abc`
print(*flattish((f,))) # should be `abc` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`

f = Fake([1, 2, 3])

print(*flattish(f)) # should be `1 2 3`
print(*flattish((f,))) # should be `1 2 3` > ``
print(*flattish(1, ('a',), ('bc',))) # should be `1 a bc`

f = Fake('abc')
print(*flatten_notype(f)) # should be `abc`
print(*flatten_notype((f,))) # should be `abc` > `c`
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`

f = Fake([1, 2, 3])     

print(*flatten_notype(f)) # should be `1 2 3` > `2 3`
print(*flatten_notype((f,))) # should be `1 2 3` > ``
print(*flatten_notype(1, ('a',), ('bc',))) # should be `1 a bc` > `1 ('a',) bc`

f = Fake('abc')
print(*flatten(f)) # should be `abc` > `a`
print(*flatten((f,))) # should be `abc` > `c`
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`

f = Fake([1, 2, 3])     

print(*flatten(f)) # should be `1 2 3` > `2 3`
print(*flatten((f,))) # should be `1 2 3` > ``
print(*flatten(1, ('a',), ('bc',))) # should be `1 a bc`

I've also tried the following with the recursive lst defined above and flatten():

>>> print(*flatten(lst))
[[...]]
>>> lst.append(0)
>>> print(*flatten(lst))
[[...], 0]
>>> print(*list(flatten(lst))[0])
[[...], 0] 0

As you can see, it fails similarly to 1 ('a',) bc as well as in its own special way.

I read how can python function access its own attributes? thinking that maybe the function could keep track of every object it had seen, but that wouldn't work either because our lst contains an object with matching identity and equality, strings contain objects that may only have matching equality, and equality isn't enough due to the possibility of something like flatten([1, 2], [1, 2]).

Is there any reliable way (i.e. doesn't simply check known types, doesn't require that a recursive container and its containers all be of the same type, etc.) to check whether a container holds iterable objects with potential infinite recursion, and reliably determine the smallest unique container? If there is, please explain how it can be done, why it is reliable, and how it handles various recursive circumstances. If not, please explain why this is logically impossible.

like image 848
TigerhawkT3 Avatar asked Jan 25 '16 10:01

TigerhawkT3


2 Answers

I don't think there's a reliable way to find out if an arbitrary iterable is infinite. The best we can is to yield primitives infinitely from such an iterable without exhausting the stack, for example:

from collections import deque

def flat(iterable):

    d = deque([iterable])

    def _primitive(x):
        return type(x) in (int, float, bool, str, unicode)

    def _next():
        x = d.popleft()
        if _primitive(x):
            return True, x
        d.extend(x)
        return False, None

    while d:
        ok, x = _next()
        if ok:
            yield x


xs = [1,[2], 'abc']
xs.insert(0, xs)

for p in flat(xs):
    print p

The above definition of "primitive" is, well, primitive, but that surely can be improved.

like image 179
georg Avatar answered Nov 15 '22 21:11

georg


The scenario you ask about is very loosely defined. As defined in your question, it is logically impossible "to check whether a container holds iterable objects with potential infinite recursion[.]" The only limit on the scope of your question is "iterable" object. The official Python documentation defines "iterable" as follows:

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. [...]

The key phrase here is "any classes [defined] with an __iter__() or __getitem__() method." This allows for "iterable" objects with members that are generated on demand. For example, suppose that someone seeks to use a bunch of string objects that automatically sort and compare in chronological order based on the time at which the particular string was created. They either subclass str or reimplement its functionality, adding a timestamp associated with each pointer to a timestampedString( ) object, and adjust the comparison methods accordingly.

Accessing a substring by index location is a way of creating a new string, so a timestampedString( ) of len( ) == 1 could legitimately return a timestampedString( ) of len( ) == 1 with the same character but a new timestamp when you access timestampedString( )[0:1]. Because the timestamp is part of the specific object instance, there is no kind of identity test that would say that the two objects are the same unless any two strings consisting of the same character are considered to be the same. You state in your question that this should not be the case.

To detect infinite recursion, you first need to add a constraint to the scope of your question that the container only contain static, i.e. pre-generated, objects. With this constraint, any legal object in the container can be converted to some byte-string representation of the object. A simple way to do this would be to pickle each object in the container as you reach it, and maintain a stack of the byte-string representations that result from pickling. If you allow any arbitrary static object, nothing less than a raw-byte interpretation of the objects is going to work.

However, algorithmically enforcing the constraint that the container only contain static objects presents another problem: it requires type-checking against some pre-approved list of types such as some notion of primitives. Two categories of objects can then be accommodated: single objects of a known-static type (e.g. primitives) and containers for which the number of contained items can be determined in advance. The latter category can then be shown to be finite when that many contained objects have been iterated through and all have been shown to be finite. Containers within the container can be handled recursively. The known-static type single objects are the recursive base-case.

If the container produces more objects, then it violates the definition of this category of object. The problem with allowing arbitrary objects in Python is that these objects can be defined in Python code that can use components written in C code and any other language that C can be linked to. There is no way to evaluate this code to determine if it actually complies with the static requirement.

like image 25
RichardB Avatar answered Nov 15 '22 19:11

RichardB