Are Lisp lists always implemented as linked lists under the hood?
Is this a problem as far as processor caching goes? If so, are there solutions that use more contiguous structures which help caching?
Linked lists are one of Lisp's major data structures, and Lisp source code is made of lists. Thus, Lisp programs can manipulate source code as a data structure, giving rise to the macro systems that allow programmers to create new syntax or new domain-specific languages embedded in Lisp.
Present day's Common LISP provides other data structures like, vector, hash table, classes or structures. Lists are single linked lists. In LISP, lists are constructed as a chain of a simple record structure named cons linked together.
Linked lists differ from lists in the way that they store elements in memory. While lists use a contiguous memory block to store references to their data, linked lists store references as part of their own elements.
Using linked list binary search will take O(n) time. So binary search is inefficient with linked list.
Rainer has already mentioned that various memory management techniques help with locality. I'd like to present two experiments, using SBCL, that illustrate his point.
First off, a quick utility to print the addresses of each cons in a list.
(defun print-addresses (list)
(mapl (lambda (cons)
(format t "address: 0x~X~%"
(sb-kernel:get-lisp-obj-address cons)))
list))
In the first experiment, we can see that allocation is contiguous, so we can create a list with ten elements and poking at their raw addresses shows us that they're close together:
> (print-addresses (loop repeat 10 collect 'dummy))
address: 0x1003F57167
address: 0x1003F57177
address: 0x1003F57187
address: 0x1003F57197
address: 0x1003F571A7
address: 0x1003F571B7
address: 0x1003F571C7
address: 0x1003F571D7
address: 0x1003F571E7
address: 0x1003F571F7
Second experiment. What if we do some unrelated allocation in between? Let's assign such a list to a variable so we can poke it later.
(defparameter *another-list*
(loop repeat 10
;; using eval to trick the compiler into
;; compiling this piece of dummy code
do (eval '(make-array (random 1000)))
collect 'dummy))
We can see that the addresses are more random this time:
> (print-addresses *another-list*)
address: 0x10046E9AF7
address: 0x10046EB367
address: 0x10046ECB97
address: 0x10046EE827
address: 0x10046EF247
address: 0x10046F1F17
address: 0x10046F2007
address: 0x10046F3FD7
address: 0x10046F5E67
address: 0x10046F6887
Now, if we invoke the GC with (sb-ext:gc)
, we can see that it has packed the conses together:
> (sb-ext:gc)
> (print-addresses *another-list*)
address: 0x1004738007
address: 0x1004738017
address: 0x1004738027
address: 0x1004738037
address: 0x1004738047
address: 0x1004738057
address: 0x1004738067
address: 0x1004738077
address: 0x1004738087
address: 0x1004738097
In these examples we didn't assess the locality of the list elements, I guess that's an experiment for another day. :-)
Linked pairs are the usual implementation, but there have been other approaches in the past.
CDR-coding is a list compression scheme that was designed to improve the contiguity and data size of cons lists that was supported in hardware on some of the Lisp machines. The basic idea is to use a tag to indicate the shape of a cons: one of the possibilities is to store the next cons directly after the first element, essentially eliding the cdr
field.
This next cons can itself be compressed in the same way, and so in favourable situations you end up with an array-like structure with excellent contiguity. (It's not quite an array as there must be space for the tagging information, and you can't index into it.)
One of the tricky parts is efficiently supporting mutation of both car
and cdr
of compressed conses. (See Steele's paper, "Destructive Reordering of CDR-Coded Lists".) If the conses are immutable the tagging scheme can be simpler. This FAQ has some interesting discussion on the tradeoffs.
A downside of CDR-coding is that because a cons can be various different 'shapes', dispatch on the tag is needed for list operations. This introduces code size and branch misprediction costs. These costs make the feature considerably less attractive, to the point that I don't know of any modern Lisp implementation that uses CDR-coding.
Where contiguity is a concern, a Lisp programmer would usually simply use an array.
Lisp implementations often can store some values directly in cons cells: fixnums, characters, ... For everything else a pointer will be stored in the car
or cdr
.
Nowadays almost all implementations which are using cons cells don't use optimizations like cdr-coding
.
Memory locality usually is improved by using a copying / compacting / generational garbage collector.
copying -> when a space is full, the GC copies the list and will allocate the new cells next to each other in a new storage area
compacting -> some scheme to get rid of memory gaps or similar
generational -> longer living objects will be promoted to different storage areas. Thus a list which has survived some GCs will get copyied to another generation and the cells will be allocated next to each other.
Sometimes above GC stretegies are combined in fancy ways.
Also note that in many Lisp programs many of those cons cells may be short-lived:
(mapcar #'1+
(mapcar #'isqrt '(10 20 30 40 50)) ; <- result is 'garbage'
)
The list of integer square roots is immediately garbage. The function will just walk through the fresh cons cells and allocate new fresh cons cells and there won't be much cache non-locality.
Allocation of cons cells can be reduced by using destructive operations. Above can be written as:
CL-USER 24 > (let ((s (mapcar #'isqrt '(10 20 30 40 50))))
(map-into s #'1+ s))
(4 5 6 7 8)
This will get rid of one allocated list and further improves locality.
The philosophically "right" answer would be "Lisp has no lists, only CONSes". Conses are typically used to build lists, to the point that a lot of functions in the CL standard and in libraries operate on those kinds of lists. But conses can also be used to build other kinds of structures, like maps or graphs. So in (traditional) Lisps the fundamental data structure is the cons, not the list. The list is only one convenient application of the cons. So, by "Lisp lists" you are really meaning "lists implemented with Lisp conses" and those, well, cannot be implemented with something different from conses ;)
Then of course there are techniques like CDR-coding, mentioned in other answers, that may be used to represent certain cons-based structures efficiently. There are also libraries that provide list data structures that are not based on linked conses (like FSet for Common Lisp).
This is true for "traditional" Lisps like Common Lisp and Scheme. Clojure does have lists as a fundamental data type and AFAIK doesn't have conses at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With