Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python hangs indefinitely trying to delete deeply recursive object

I wrote a ternary search tree in Python and I've noticed that when the tree gets very deep, attempting to delete it causes Python to hang indefinitely. Here is a stripped version of the code that produces this behaviour:

import random
import sys
from collections import deque


class Node():
    __slots__ = ("char", "count", "lo", "eq", "hi")

    def __init__(self, char):
        self.char = char
        self.count = 0

        self.lo = None
        self.eq = None
        self.hi = None


class TernarySearchTree():
    """Ternary search tree that stores counts for n-grams
    and their subsequences.
    """

    def __init__(self, splitchar=None):
        self.root = None
        self.splitchar = splitchar

    def insert(self, string):
        self.root = self._insert(string, self.root)

    def _insert(self, string, node):
        """Insert string at a given node.
        """
        if not string:
            return node

        char, *rest = string

        if node is None:
            node = Node(char)

        if char == node.char:
            if not rest:
                node.count += 1
                return node
            else:
                if rest[0] == self.splitchar:
                    node.count += 1
                node.eq = self._insert(rest, node.eq)

        elif char < node.char:
            node.lo = self._insert(string, node.lo)

        else:
            node.hi = self._insert(string, node.hi)

        return node


def random_strings(num_strings):
    random.seed(2)
    symbols = "abcdefghijklmnopqrstuvwxyz"

    for i in range(num_strings):
        length = random.randint(5, 15)
        yield "".join(random.choices(symbols, k=length))


def train():
    tree = TernarySearchTree("#")
    grams = deque(maxlen=4)

    for token in random_strings(27_000_000):
        grams.append(token)
        tree.insert("#".join(grams))

    sys.stdout.write("This gets printed!\n")
    sys.stdout.flush()


def main():
    train()

    sys.stdout.write("This doesn't get printed\n")
    sys.stdout.flush()


if __name__ == "__main__":
    main()

This prints out "This gets printed", but not "This doesn't get printed". Trying to delete the object manually has the same effect. If I reduce the number of strings inserted from 27 million to 25 million, everything is fine - Python prints out both statements and then exits immediately. I tried to run GDB and this is the backtrace that I get:

#0  pymalloc_free.isra.0 (p=0x2ab537a4d580) at /tmp/build/80754af9/python_1546061345851/work/Objects/obmalloc.c:1797
#1  _PyObject_Free (ctx=<optimized out>, p=0x2ab537a4d580)
    at /tmp/build/80754af9/python_1546061345851/work/Objects/obmalloc.c:1834
#2  0x0000555555701c18 in subtype_dealloc ()
    at /tmp/build/80754af9/python_1546061345851/work/Objects/typeobject.c:1256
#3  0x0000555555738ce6 in _PyTrash_thread_destroy_chain ()
    at /tmp/build/80754af9/python_1546061345851/work/Objects/object.c:2212
#4  0x00005555556cd24f in frame_dealloc (f=<optimized out>)
    at /tmp/build/80754af9/python_1546061345851/work/Objects/frameobject.c:492
#5  function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>)
    at /tmp/build/80754af9/python_1546061345851/work/Objects/call.c:291
#6  _PyFunction_FastCallKeywords () at /tmp/build/80754af9/python_1546061345851/work/Objects/call.c:408
#7  0x00005555557241a6 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>)
    at /tmp/build/80754af9/python_1546061345851/work/Python/ceval.c:4616
#8  _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546061345851/work/Python/ceval.c:3124
#9  0x00005555556ccecb in function_code_fastcall (globals=<optimized out>, nargs=0, args=<optimized out>, 
    co=<optimized out>) at /tmp/build/80754af9/python_1546061345851/work/Objects/call.c:283
#10 _PyFunction_FastCallKeywords () at /tmp/build/80754af9/python_1546061345851/work/Objects/call.c:408
#11 0x00005555557241a6 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>)
    at /tmp/build/80754af9/python_1546061345851/work/Python/ceval.c:4616
#12 _PyEval_EvalFrameDefault () at /tmp/build/80754af9/python_1546061345851/work/Python/ceval.c:3124
#13 0x00005555556690d9 in _PyEval_EvalCodeWithName ()
    at /tmp/build/80754af9/python_1546061345851/work/Python/ceval.c:3930
#14 0x0000555555669fa4 in PyEval_EvalCodeEx () at /tmp/build/80754af9/python_1546061345851/work/Python/ceval.c:3959
#15 0x0000555555669fcc in PyEval_EvalCode (co=co@entry=0x2aaaaac08300, globals=globals@entry=0x2aaaaaba8168, 
    locals=locals@entry=0x2aaaaaba8168) at /tmp/build/80754af9/python_1546061345851/work/Python/ceval.c:524
#16 0x0000555555783664 in run_mod () at /tmp/build/80754af9/python_1546061345851/work/Python/pythonrun.c:1035
#17 0x000055555578d881 in PyRun_FileExFlags ()
    at /tmp/build/80754af9/python_1546061345851/work/Python/pythonrun.c:988
#18 0x000055555578da73 in PyRun_SimpleFileExFlags ()
    at /tmp/build/80754af9/python_1546061345851/work/Python/pythonrun.c:429
#19 0x000055555578db3d in PyRun_AnyFileExFlags ()
    at /tmp/build/80754af9/python_1546061345851/work/Python/pythonrun.c:84
#20 0x000055555578eb2f in pymain_run_file (p_cf=0x7fffffffd240, filename=0x5555558c5440 L"minimal.py", 
    fp=0x5555559059a0) at /tmp/build/80754af9/python_1546061345851/work/Modules/main.c:427
#21 pymain_run_filename (cf=0x7fffffffd240, pymain=0x7fffffffd350)
    at /tmp/build/80754af9/python_1546061345851/work/Modules/main.c:1627
#22 pymain_run_python (pymain=0x7fffffffd350) at /tmp/build/80754af9/python_1546061345851/work/Modules/main.c:2876
#23 pymain_main () at /tmp/build/80754af9/python_1546061345851/work/Modules/main.c:3037
#24 0x000055555578ec4c in _Py_UnixMain () at /tmp/build/80754af9/python_1546061345851/work/Modules/main.c:3072
#25 0x00002aaaaaf0d3d5 in __libc_start_main () from /lib64/libc.so.6
#26 0x0000555555733982 in _start () at ../sysdeps/x86_64/elf/start.S:103

If I try to step through from that point on, execution loops through three lines in obmalloc.c - GDB says on lines 1796-98, but the numbers seem to be off and the file in the traceback (in /tmp/) doesn't exist.

This happens on both Python 3.7.3 and 3.6, though the number of strings required to cause the hang-up differs (27 million was where it happened for both versions). The required memory at that point is about 80 gigabytes and it takes 45 minutes until it prints out the first statement. The version I actually use is written in cython, which reduces memory and runtime, but faces the same issue.

Using the object works as intended, even if a billion strings are inserted. Keeping the object alive (returning it from the function or putting it into globals()) postpones the problem until Python exits - so at least I can ensure that all work is done at that point, but I would really like to know what is going wrong here.

Edit: I installed python via conda (4.6.2) and I am on a linux server node:

> uname -a
Linux computingnodeX 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
like image 223
Christian Adam Avatar asked May 20 '19 22:05

Christian Adam


2 Answers

Update

On the bug report, a run on a giant machine showed that the time to reclaim the tree storage fell from almost 5 hours to about 70 seconds:

master:

build time 0:48:53.664428
teardown time 4:58:20.132930

patched:

build time 0:48:08.485639
teardown time 0:01:10.46670

(Proposed fix)

Here's a pull request against the CPython project that proposes to "fix this" by removing the searches entirely. It works fine for my 10x smaller test case, but I don't have access to a machine with anywhere near enough RAM to run the original. So I'm waiting for someone who does before merging the PR (who knows? there may be more than one "huge number of objects" design flaw here).

Original reply

Thank you for the nice job of providing an executable sample reproducing your problem! Alas, I can't run it - requires far more memory than I have. If I cut the number of strings by a factor of ten, I end up with about 100,000,000 Node instances in about 8GB of RAM, and it takes about 45 seconds for garbage collection to tear down the tree (Python 3.7.3). So I'm guessing you have about a billion Node instances.

I expect you're not getting responses because there's no "general problem" known here, and it requires such a hefty machine to even try it. The python-dev mailing list may be a better place to ask, or open an issue on https://bugs.python.org.

The usual cause for very slow garbage collection at the end of a run is that memory got swapped out to disk, and then it takes thousands of times longer "than normal" to read objects back into RAM, in "random" order, to tear them down. I'm assuming that's not happening here, though. If it were, CPU usage usually drops to near 0, as the process spends most of its time waiting for disk reads.

Less often, some bad pattern is hit in the underlying C library's malloc/free implementation. But that also seems unlikely here, because these objects are small enough that Python only asks C for "big chunks" of RAM and carves them up itself.

So I don't know. Because nothing can be ruled out, you should also give details about the OS you're using, and how Python was built.

Just for fun, you could try this to get some sense of how far things get before it stalls out. First add this method to Node:

def delete(self):
    global killed
    if self.lo:
        self.lo.delete()
        self.lo = None
    if self.eq:
        self.eq.delete()
        self.eq = None
    if self.hi:
        self.hi.delete()
        self.hi = None
    killed += 1
    if killed % 100000 == 0:
        print(f"{killed:,} deleted")

At the end of train(), add this:

tree.root.delete()

And replace the call to main() with:

killed = 0
main()
print(killed, "killed")

Which may or may not reveal something interesting.

DIDN'T HANG FOR SOMEONE ELSE

I posted a note about this to the python-dev mailing list, and one person so far replied privately:

I started this using Python 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 23:01:00) [GCC 7.3.0] :: Anaconda, Inc. on linux

$ python fooz.py
This gets printed!
This doesn't get printed

It did take ~80 GB of RAM and several hours, but did not get stuck.

So, unless someone else pops up who can reproduce it, we're probably out of luck here. You at least need to give more info about exactly which OS you're using, and how Python was built.

like image 195
Tim Peters Avatar answered Oct 03 '22 20:10

Tim Peters


Could you try recompiling Python?

In obmalloc.c, there is ARENA_SIZE macro defined as:

#define ARENA_SIZE              (256 << 10)     /* 256KB */

This default value is not optimized for very large memory systems.

Your script takes long time for sorting arenas by number of free pools in it. It can be O(N^2) in worst case, when many arenas has same number of free pools.

Your script frees memory blocks in random order, it is near to the worst case.

N is number of arenas here. When you change ARENA_SIZE to (1024 << 10), size of arena is 4x, N become 1/4, and N^2 become 1/16.


If you can not recompile Python, you can use malloc instead of pymalloc.

$ PYTHONMALLOC=malloc python3 yourscript.py

You can override malloc with jemalloc or tcmalloc using LD_PRELOAD environment variable.

like image 28
methane Avatar answered Oct 03 '22 22:10

methane