What is the difference between a cyclic list and an infinite list in haskell?

Tags:

Referencing @dfeuer's answer to this question: Least expensive way to construct cyclic list in Haskell, which says that using cyclic lists 'defeats' the garbage collector as it has to keep everything you've consumed from a cyclic list allocated till you drop the reference to any cons cells in the list.

Apparently in Haskell a cyclic list and an infinite list are two separate things. This blog (https://unspecified.wordpress.com/2010/03/30/a-doubly-linked-list-in-haskell/) says that if you implement cycle as follows:

cycle xs = xs ++ cycle xs

it is an infinite list, not a cyclic list. To make it cyclic you have to implement it like this (as is found in the Prelude source code):

cycle xs = xs' where xs' = xs ++ xs'

What exactly is the difference between these two implementations? And why is it that if you are holding onto one cons cell somewhere in a cyclic list, that the garbage collector has to keep everything before it allocated as well?

989

asked Aug 26 '14 05:08

Ramith Jayatilleka

2 Answers

The difference is entirely in the memory representation. From the point of view of the semantics of the language, they're indistinguishable—you can't write a function that can tell them apart, so your two versions of cycle are considered two implementations of the same function (they're the exact same mapping of arguments to results). In fact, I don't know if the language definition guarantees that one of those is cyclical and the other infinite.

But anyway, let's bring out the ASCII art. Cyclical list:

   +----+----+                 +----+----+
   | x0 |   ----->   ...   --->| xn |    |
   +----+----+                 +----+-|--+
     ^                                |
     |                                |
     +--------------------------------+

Infinite list:

   +----+----+
   | x0 |   ----->  thunk that produces infinite list
   +----+----+

The thing with the cyclical list is that from every cons cell in the list there is a path to all of the others and itself. This means that from the point of view of the garbage collector, if one of the cons cells is reachable, then all are. In the plain infinite list, on the other hand, there aren't any cycles, so from a given cons cell only its successors are reachable.

Note that the infinite list representation is more powerful than the cyclical one, because the cyclical representation only works with lists that repeat after some number of elements. For example, the list of all prime numbers can be represented as an infinite list, but not as a cyclical one.

Note also that this distinction can be generalized into two distinct ways of implementing the fix function:

fix, fix' :: (a -> a) -> a
fix  f = let result = f result in result
fix' f = f (fix' f)

-- Circular version of cycle:
cycle  xs = fix (xs++)

-- Infinite list version of cycle:
cycle' xs = fix' (xs++)

The GHC libraries go for my fix definition. The way GHC compiles code means that the thunk created for result is used both as the result and the argument of the application of f. I.e., the thunk, when forced, will call the object code for f with the thunk itself as its argument, and replace the thunk's contents with the result.

118

answered Nov 18 '22 10:11

Luis Casillas

Cyclic lists and infinite lists are different operationally, but not semantically.

A cyclic list is literally a loop in memory - imagine a singly linked list with the pointers following a cycle - so takes up constant space. Because each cell in the list can be reached from any other cell, holding onto any one cell will cause the entire list to be held onto.

An infinite list will take up more and more space as you evaluate more of it. Earlier elements will be garbage collected if no longer needed, so programs that process it may still run in constant space, although the garbage collection overhead will be higher. If earlier elements in the list are needed, for example because you hold onto a reference to the head of the list, then the list will consume linear space as you evaluate it, and will eventually exhaust available memory.

The reason for this difference is that without optimisations, a typical Haskell implementation like GHC will allocate memory once for a value, like xs' in the second definition of cycle, but will repeatedly allocate memory for a function invocation, like cycle xs in the first definition.

In principle optimisations might turn one definition into the other, but because of the quite different performance characteristics, it's unlikely that this will happen in practice as compilers are generally quite conservative about making programs behave worse. In some cases the cyclic variant will be worse because of the garbage collection properties already mentioned.

answered Nov 18 '22 10:11

GS - Apologise to Monica

Related questions
                            
                                Google Drive PHP API - Simple File Upload
                            
                                How to avoid "RuntimeWarning: invalid value encountered in divide" in NumPy?
                            
                                How to implement real time data for a web page
                            
                                Interface vs record of functions
                            
                                How to make two SQL queries really asynchronous
                            
                                How can I extract a multi-part rar file in linux? [closed]
                            
                                Javascript Object in C#
                            
                                plotting 3d vectors using Python matplotlib
                            
                                Project <PROJECT NAME> is not up to date. Missing input file 'netframework,version=v4.0,profile=client.assemblyattributes.cs
                            
                                Powershell unable to find type [System.Windows.Forms.KeyEventHandler]
                            
                                How do I create a heterogeneous collection of objects?
                            
                                Linux kernel: why do 'subclass' structs put base class info at end?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With