Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Operations on Clojure collections

I am quite new to Clojure, although I am familiar with functional languages, mainly Scala.

I am trying to figure out what is the idiomatic way to operate on collections in Clojure. I am particularly confused by the behaviour of functions such as map.

In Scala, a great care is taken in making so that map will always return a collection of the same type of the original collection, as long as this makes sense:

List(1, 2, 3) map (2 *) == List(2, 4, 6)
Set(1, 2, 3) map (2 *) == Set(2, 4, 6)
Vector(1, 2, 3) map (2 *) == Vector(2, 4, 6)

Instead, in Clojure, as far as I understand, most operations such as map or filter are lazy, even when invoked on eager data-structures. This has the weird result of making

(map #(* 2 %) [1 2 3])

a lazy-list instead of a vector.

While I prefer, in general, lazy operations, I find the above confusing. In fact, vectors guarantee certain performance characteristics that lists do not.

Say I use the result from above and append on its end. If I understand correctly, the result is not evaluated until I try to append on it, then it is evaluated and I get a list instead of a vector; so I have to traverse it to append on the end. Of course I could turn it into a vector afterwards, but this gets messy and can be overlooked.

If I understand correctly, map is polymorphic and it would not be a problem to implement is so that it returns a vector on vectors, a list on lists, a stream on streams (this time with lazy semantics) and so on. I think I am missing something about the basic design of Clojure and its idioms.

What is the reason basic operations on clojure data structures do not preverse the structure?

like image 381
Andrea Avatar asked Jan 02 '13 17:01

Andrea


People also ask

What is a collection in Clojure?

Clojure collections "collect" values into compound values. There are four key Clojure collection types: vectors, lists, sets, and maps. Of those four collection types, vectors and lists are ordered.

What is Lazyseq in Clojure?

Lazy Sequences in Clojure Clojure reference explains laziness as: Most of the sequence library functions are lazy, i.e. functions that return seqs do so incrementally, as they are consumed, and thus consume any seq arguments incrementally as well.

How does map work Clojure?

Maps are represented as alternating keys and values surrounded by { and } . When Clojure prints a map at the REPL, it will put `,'s between each key/value pair. These are purely used for readability - commas are treated as whitespace in Clojure. Feel free to use them in cases where they help you!

Is Clojure a sequence?

Clojure defines many algorithms in terms of sequences (seqs). A seq is a logical list, and unlike most Lisps where the list is represented by a concrete, 2-slot structure, Clojure uses the ISeq interface to allow many data structures to provide access to their elements as sequences.


1 Answers

In Clojure many functions are based on the Seq abstraction. The benefit of this approach is that you don't have to write a function for every different collection type - as long as your collection can be viewed as a sequence (things with a head and possibly a tail), you can use it with all of the seq functions. Functions that take seqs and output seqs are much more composable and thus re-usable than functions that limit their use to a certain collection type. When writing your own function on seq you don't need to handle special cases like: if the user gives me a vector, I have to return a vector, etc. Your function will fit in just as good inside the seq pipeline as any other seq function.

The reason that map returns a lazy seq is a design choice. In Clojure lazyness is the default for many of these functional constructions. If you want to have other behavior, like parallelism without intermediate collections, take a look at the reducers library: http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html

As far as performance goes, map always has to apply a function n times on a collection, from the first to the last element, so its performance will always be O(n) or worse. In this case vector or list makes no difference. The possible benefit that laziness would give you is when you would only consume the first part of the list. If you must append something to the end of map's output, a vector is indeed more efficient. You can use mapv (added in Clojure 1.4) in this case: it takes in a collection and will output a vector. I would say, only worry about these performance optimizations if you have a very good reason for it. Most of the time it's not worth it.

Read more about the seq abstraction here: http://clojure.org/sequences

Another vector-returning higher order function that was added in Clojure 1.4 is filterv.

like image 128
Michiel Borkent Avatar answered Nov 03 '22 18:11

Michiel Borkent