Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On performance of Clojure's `first` function

I saw the following example in Rich's video on sequences http://blip.tv/file/734409 about 33-36 minutes into it:

(first "abcd") => \a

Now, he say that this expands to (sort of):

(first "abcd") => (first (seq "abcd")) => (first '(\a \b \c \d))

So, it looks like an O(N) operation because the full copy of the string is being made. First of all, if a String is immutable, then why is it copied? (Edit: based on an answer, it probably is not; just looked that way when printed.) Secondly, suppose first operated on something else in Java that is mutable, say a linked list of integers. Should first act in a lazy manner (e.g. create a persistent sequence first)? Would not it make sense to evaluate it right away and save it? It would be some sort of a hack that would break the nice abstraction, but get the job done fast, I think. When you call (seq "abcd"), you do not know how it will be used. When you call a first on a seq, you know what to do. But, when you call first on "abcd", I think that performing a hacky and fast "grab and save it", approach is better than grab a sequence and then call first.

Am I missing something? Did Rich Hickey skip some steps?

Let me know if I have questions. Thanks!

like image 264
Hamish Grubijan Avatar asked Oct 03 '10 16:10

Hamish Grubijan


3 Answers

It doesn't follow that a full copy of the string is being made. (But this is a good question.)

In the source for clojure.lang.RT you'll notice that the runtime uses charsequence to create the sequence:

static ISeq seqFrom(Object coll){
    if(coll instanceof Seqable)
        return ((Seqable) coll).seq();
    else if(coll == null)
        return null;
    else if(coll instanceof Iterable)
        return IteratorSeq.create(((Iterable) coll).iterator());
    else if(coll.getClass().isArray())
        return ArraySeq.createFromObject(coll);
    else if(coll instanceof CharSequence)
        return StringSeq.create((CharSequence) coll);
          .
          .
          .

So really, this is a java question, not a clojure question. I haven't checked, but I'm fairly certain that CharSequence does the "right thing".

like image 116
Rob Lachlan Avatar answered Nov 09 '22 02:11

Rob Lachlan


The other answers are correct, but I'll use this chance to point out an interesting effect of Clojure's philosophy of immutability.

So, it looks like an O(N) operation because the full copy of the string is being made.

Strings, like other data structures in Clojure, are immutable. (They're a special case, being implemented in the JVM rather than Clojure, but that's immaterial to this point.)

Copies of immutable objects are free. That's right, free. Because you don't need to copy at all. The same allocated object in memory can simply be reused wholesale, because it's guaranteed to always be the same as when it was "copied."

So a function like seq never has to actually copy anything. It just operates on whatever is passed, directly, and returns a (possibly lazy) sequence that provides an abstract interface to whatever you called seq on.

And so, seq is always O(1).

like image 36
levand Avatar answered Nov 09 '22 01:11

levand


You might look at the seq as if it creates a lazy sequence based on the string.

It doesn't actually create a lazy sequence, it actually short circuits to the java implementation of CharSequence, but that's an implementation detail.

From a complexity point of view, first is O(1) on a seq, and you can create a seq in constant time from anything is iterable.

like image 3
mkm Avatar answered Nov 09 '22 03:11

mkm