Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicate strings from a list

Tags:

common-lisp

I have a dead simple Common Lisp question: what is the idiomatic way of removing duplicates from a list of strings?

remove-duplicates works as I'd expect for numbers, but not for strings:

* (remove-duplicates '(1 2 2 3))

(1 2 3)

* (remove-duplicates '("one" "two" "two" "three"))

("one" "two" "two" "three")

I'm guessing there's some sense in which the strings aren't equal, most likely because although "foo" and "foo" are apparently identical, they're actually pointers to different structures in memory. I think my expectation here may just be a C hangover.

like image 204
Duncan Bayne Avatar asked Oct 29 '11 09:10

Duncan Bayne


1 Answers

You have to tell remove-duplicates how it should compare the values. By default, it uses eql, which is not sufficient for strings. Pass the :test function as in:

(remove-duplicates your-sequence :test #'equal). 

(Edit to address the question from the comments): As an alternative to equal, you could use string= in this example. This predicate is (in a way) less generic than equal and it might (could, probably, possibly, eventually...) thus be faster. A real benefit might be, that string= can tell you, if you pass a wrong value:

(equal 1 "foo")

happily yields nil, whereas

(string= 1 "foo")

gives a type-error condition. Note, though, that

(string= "FOO" :FOO)

is perfectly well defined (string= and its friend are defined in terms of "string designators" not strings), so type safety would go only so far here.

The standard eql predicate, on the other hand, is almost never the right way to compare strings. If you are familiar with the Java language, think of eql as using == while equal (or string=, etc.) calling the equals(Object) method. Though eql does some type introspection (as opposed to eq, which does not), for most (non-numeric) lisp types, eql boils down to something like a pointer comparison, which is not sufficient, if you want to discriminate values based on what they actually contain, and not merely on where in memory they are located.

For the more Pythonic inclined, eq (and eql for non-numeric types) is more like the is operator, whereas equal is more like == which calls __eq__.

like image 58
Dirk Avatar answered Sep 28 '22 06:09

Dirk