I am quite new to Clojure, although I am familiar with functional languages, mainly Scala.
I am trying to figure out what is the idiomatic way to operate on collections in Clojure. I am particularly confused by the behaviour of functions such as map
.
In Scala, a great care is taken in making so that map
will always return a collection of the same type of the original collection, as long as this makes sense:
List(1, 2, 3) map (2 *) == List(2, 4, 6)
Set(1, 2, 3) map (2 *) == Set(2, 4, 6)
Vector(1, 2, 3) map (2 *) == Vector(2, 4, 6)
Instead, in Clojure, as far as I understand, most operations such as map
or filter
are lazy, even when invoked on eager data-structures. This has the weird result of making
(map #(* 2 %) [1 2 3])
a lazy-list instead of a vector.
While I prefer, in general, lazy operations, I find the above confusing. In fact, vectors guarantee certain performance characteristics that lists do not.
Say I use the result from above and append on its end. If I understand correctly, the result is not evaluated until I try to append on it, then it is evaluated and I get a list instead of a vector; so I have to traverse it to append on the end. Of course I could turn it into a vector afterwards, but this gets messy and can be overlooked.
If I understand correctly, map
is polymorphic and it would not be a problem to implement is so that it returns a vector on vectors, a list on lists, a stream on streams (this time with lazy semantics) and so on. I think I am missing something about the basic design of Clojure and its idioms.
What is the reason basic operations on clojure data structures do not preverse the structure?
Clojure collections "collect" values into compound values. There are four key Clojure collection types: vectors, lists, sets, and maps. Of those four collection types, vectors and lists are ordered.
Lazy Sequences in Clojure Clojure reference explains laziness as: Most of the sequence library functions are lazy, i.e. functions that return seqs do so incrementally, as they are consumed, and thus consume any seq arguments incrementally as well.
Maps are represented as alternating keys and values surrounded by { and } . When Clojure prints a map at the REPL, it will put `,'s between each key/value pair. These are purely used for readability - commas are treated as whitespace in Clojure. Feel free to use them in cases where they help you!
Clojure defines many algorithms in terms of sequences (seqs). A seq is a logical list, and unlike most Lisps where the list is represented by a concrete, 2-slot structure, Clojure uses the ISeq interface to allow many data structures to provide access to their elements as sequences.
In Clojure many functions are based on the Seq
abstraction.
The benefit of this approach is that you don't have to write a function for every different collection type - as long as your collection can be viewed as a sequence (things with a head and possibly a tail), you can use it with all of the seq functions. Functions that take seqs and output seqs are much more composable and thus re-usable than functions that limit their use to a certain collection type. When writing your own function on seq you don't need to handle special cases like: if the user gives me a vector, I have to return a vector, etc. Your function will fit in just as good inside the seq pipeline as any other seq function.
The reason that map returns a lazy seq is a design choice. In Clojure lazyness is the default for many of these functional constructions. If you want to have other behavior, like parallelism without intermediate collections, take a look at the reducers library: http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html
As far as performance goes, map always has to apply a function n times on a collection, from the first to the last element, so its performance will always be O(n) or worse. In this case vector or list makes no difference. The possible benefit that laziness would give you is when you would only consume the first part of the list. If you must append something to the end of map's output, a vector is indeed more efficient. You can use mapv
(added in Clojure 1.4) in this case: it takes in a collection and will output a vector. I would say, only worry about these performance optimizations if you have a very good reason for it. Most of the time it's not worth it.
Read more about the seq abstraction here: http://clojure.org/sequences
Another vector-returning higher order function that was added in Clojure 1.4 is filterv
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With