I'm looking for something like #'delete-duplicates
, but I know that all elements of the list are already sorted, or inversely sorted, or at least arranged so that duplicates will already be adjacent to each other. I wish to use that knowledge to ensure that execution speed is not proporational to the square of the number of elements in the list. It's trivial to use #'maplist
to grow my own solution, but is there something already in the language? It would be embarrassing to reinvent the wheel.
To be clear, for largish lengths of lists, I would like the running time of the deletion to be proportional to the length of the list, not proportional to the square of that length. This is the behavior I wish to avoid:
1 (defun one-shot (cardinality)
2 (labels ((generate-list (the-count)
3 (let* ((the-list (make-list the-count)))
4 (do ((iterator 0 (1+ iterator)))
5 ((>= iterator the-count))
6 (setf (nth iterator the-list) iterator))
7 the-list)))
8 (let* ((given-list (generate-list cardinality))
9 (stripped-list)
10 (start-time)
11 (end-time))
12 (setf start-time (get-universal-time))
13 (setf stripped-list (delete-duplicates given-list :test #'eql))
14 (setf end-time (get-universal-time))
15 (princ "for n = ")
16 (princ cardinality)
17 (princ ", #'delete-duplicates took ")
18 (princ (- end-time start-time))
19 (princ " seconds")
20 (terpri))))
21 (one-shot 20000)
22 (one-shot 40000)
23 (one-shot 80000)
for n = 20000, #'delete-duplicates took 6 seconds
for n = 40000, #'delete-duplicates took 24 seconds
for n = 80000, #'delete-duplicates took 95 seconds
There's nothing like this in the language, but something like this makes just one pass through the list:
(defun delete-adjacent-duplicates (list &key key (test 'eql))
(loop
for head = list then (cdr head)
until (endp head)
finally (return list)
do (setf (cdr head)
(member (if (null key) (car head)
(funcall key (car head)))
(cdr head)
:key key :test-not test))))
As, @wvxvw pointed out, it might be possible to simplify this iteration using (loop for head on list finally (return list) do ...)
. However, 3.6 Traversal Rules and Side Effects says that modifying the cdr
chain of a list during an object-traversal leads to undefined behavior. However, it's not clear whether loop for head on list
is technically an object-traversal operation or not. The documentation about loop says in 6.1.2.1.3 The for-as-on-list subclause that
In the for-as-on-list subclause, the for or as construct iterates over a list. … The variable var is bound to the successive tails of the list in form1. At the end of each iteration, the function step-fun is applied to the list; the default value for step-fun is cdr. … The for or as construct causes termination when the end of the list is reached.
This says that the step function is always applied at the end of the iteration, so it sounds like loop for head on list
should be OK. At any rate, any possible issues could be avoided by using do
loop instead:
(defun delete-adjacent-duplicates (list &key key (test 'eql))
(do ((head list (cdr head)))
((endp head) list)
(setf (cdr head)
(member (if (null key) (car head)
(funcall key (car head)))
(cdr head)
:key key :test-not test))))
The idea is to start with head
being the list, then setting its cdr
to the first tail that starts with a different element, then advancing the head, and continuing until there's nothing left. This should be linear in the length of the list, assuming that member
is implemented in a sensible way. The use of member
means that you don't have to do any extra work to get :key
and :test
working in the appropriate way. (Do note that :test
for del-dups
is going to be the :test-not
of member
.) Note: there's actually a slight issue with this, in that the key
function will called twice for each element in the final list: once when it's the first element of a tail, and once when it's the car
of head
.
CL-USER> (delete-adjacent-duplicates (list 1 1 1 1 2 2 3 3 3))
(1 2 3)
CL-USER> (delete-adjacent-duplicates (list 1 2 2))
(1 2)
CL-USER> (delete-adjacent-duplicates (list 1 3 5 6 4 2 3 5) :key 'evenp)
(1 6 3)
I expect that any linear time solution is going to take a similar approach; hold a reference to the current head, find the next tail that begins with a different element, and then make that tail the cdr
of the head.
I would expect REMOVE-DUPLICATES to have a linear time implementation. (And indeed it does* on my local SBCL install.)
Note that REMOVE-DUPLICATES and DELETE-DUPLICATES are specified to have the same return value, and that the side effects of DELETE-DUPLICATES are not guaranteed.
* The linear time code path is only taken when the :test is #'eq,#'eql, #'equal, or #'equalp (it relies on a hash table) and there is no :key or :test-not argument supplied.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With