Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will python automatically garbage collect doubly-linked list? [duplicate]

Background

I'm having a tree structure. Within this tree structure I am maintaining kids of a node as a doubly-linked list:

enter image description here
(source: Doubly linked list)

(I chose this structure due to breadth-first search method of creating this list.)

Problem

Now my concern is if garbage collector can automatically destroy this list. Naturally I keep only the reference to the root node of such three. Afaik the principle of GC is that it collects data structures in memory, to whose does not point any reference. But in doubly-linked list each node is referenced from it's sibling and the sibling references the node. So there will be always reference to a node and the GC would never collect it.

Will garbage collector handle doubly-linked list?

If not, what is the easiest way to collect it?

Related questions:

Why does Lua use a garbage collector instead of reference counting?
Python: Memory usage and optimization when modifying lists

like image 628
sumid Avatar asked Jan 12 '23 17:01

sumid


1 Answers

Each Python implementation has a different garbage collection scheme. The general-purpose answer is "Yes, if it's garbage, it should be garbage collected." But you presumably want something more specific than this.


In CPython, the garbage collection uses refcounting, plus a cycle collector. If an object's refcount drops to 0, it gets cleaned up. But in your case, when all external references to your list go away, there will still be internal references, so refcounting by itself cannot solve your problem. That's what the cycle collector is for.

Assuming your nodes do not have __del__ methods, and you have not (directly or indirectly) disabled "supplemental garbage collection" (it's on by default), the cycle collector will detect that your nodes all refer to each other, but nothing else refers to them, and clean it up. (This could take two passes, because it uses a generational system.)

You can use the gc module to explicitly run the cycle collector (gc.collect()) instead of waiting for it, or to inspect what it's doing. For example, if you do this:

gc.collect()
oldcounts = gc.get_counts()
del last_reference_to_list
gc.collect()
newcounts = gc.get_counts()
print(oldcounts, newcounts)

… you should be able to tell (not with perfect reliability, but well enough for learning and testing purposes) that your nodes are all gone.


What if your nodes do have __del__ methods? Then you will have to give the GC some help. What you need to do is break any cycles that include objects with __del__ methods. The obvious way to do that, if you don't have any node-sharing between lists, is to just walk the list and del the forward and back pointers. (Technically, you only need to del one or the other, but you might as well do both.) If you need the __del__ method on the nodes, you probably need one on the top-level dl_list (or tree_node or whatever it is that owns these), so that's an obvious place to put it.

Of course if you don't need the __del__ method, there's an even easier solution: just get rid of it.


One last possibility is to use weakref for the back links, but regular references for the forward links. That way, there are no possible cycles. But you will have to be a bit careful adding and removing nodes to make sure you never temporarily leave a node with nothing but a weakref to it.


If you're using Jython or IronPython, the garbage collection is tied to the underlying runtime (JVM or .NET), so you will have to read the appropriate documentation.

PyPy has its own garbage collector (actually, a choice of different options), which you can read about here.

If you're using a less-common implementation, there should be similar docs available.

like image 90
abarnert Avatar answered Jan 25 '23 23:01

abarnert