Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are map keys interpreted in Clojure?

I'm trying to create a map literal with the keys determined from a random function:

user=> {(str (rand-int 5)) "hello" (str (rand-int 5)) "goodbye"}                                            
IllegalArgumentException Duplicate key: (str (rand-int 5))  clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:71)

whereas

user=> {(str (rand-int 5)) "hello" (str (rand-int 6)) "goodbye"}    
{"4" "hello", "2" "goodbye"}

The Reader appears to be treating the key as an un-evaluated list.

I can't find any details about this in the documentation. Is there anyone who can help me understand this a little more?

like image 690
stuartrexking Avatar asked Nov 18 '16 07:11

stuartrexking


4 Answers

Walking through the source of Clojure compiler I've found the following:

  1. There's class LispReader which contains nested class MapReader that is responsible for reading map literals. It's method invoke reads Clojure forms between {, } symbols and returns a map (of Clojure forms) by calling RT.map method.

  2. RT.map calls PersistentHashMap.createWithCheck where actual check for duplicated keys is performed. Since we're building map of Clojure forms the check will trigger even if there are two same forms that evaluate to different values (like in your example).

  3. Evaluation of all Clojure forms is made in Compiler class, in particular map forms are evaluated within it's nested class MapExpr. It's method eval evaluates map's keys and values and again builds persistent map using RT.map. So check for duplicated keys will be performed against evaluated values as well, that's why the following code will also fail:

(let [x :foo y :foo]
  {x :bar y :baz}) ;; throws duplicated key exception

I'm not sure why authors decided to perform check for duplicated keys on both map of forms and map of values. Probably, it's some kind of "fail fast strategy": such implementation will report about errors early at compilation stage (eventhough there may be false positives) and this check won't be deffered to the runtime.

like image 142
OlegTheCat Avatar answered Sep 21 '22 20:09

OlegTheCat


Everything produced by the reader is unevaluated. That's a main idea behind the reader: It reads forms as data with minimal-to-no interpretation. The reader gives the unevaluated map to the compiler.

The reader works by building the map up incrementally via assoc or conj. However, in the past, this approach would have produced an even stranger result for your code: {(str (rand-int 5)) "goodbye"}. That is, normal assoc-rules would apply: Last key-value-pair added wins. Folks ran in to this problem, so now the reader performs a contains? check before adding values incrementally.

This article discusses Lisp-style readers in a bit more detail: http://calculist.org/blog/2012/04/17/homoiconicity-isnt-the-point/

like image 43
Brandon Bloom Avatar answered Sep 19 '22 20:09

Brandon Bloom


You are correct in that the reader doesn't evaluate the map.

Remember that evaluation happens after reading.

From the Clojure reader reference documentation:

That said, most Clojure programs begin life as text files, and it is the task of the reader to parse the text and produce the data structure the compiler will see. This is not merely a phase of the compiler. The reader, and the Clojure data representations, have utility on their own in many of the same contexts one might use XML or JSON etc.

One might say the reader has syntax defined in terms of characters, and the Clojure language has syntax defined in terms of symbols, lists, vectors, maps etc. The reader is represented by the function read, which reads the next form (not character) from a stream, and returns the object represented by that form.

The evaluator needs the unevaluated map to walk through it and evaluate its keys and values. If that map has the same form more than once as a key it is an invalid map literal.

You can use the hash-map function instead

(hash-map (rand-int 5) "hello"
          (rand-int 5) "goodbye")`.

Note that the resulting map may have one or two keys depending on whether the keys are distinct.

If you want to enforce two keys do sth. like

(zipmap (distinct (repeatedly #(rand-int 5)))
        ["hello" "goodbye"])
like image 38
Leon Grapenthin Avatar answered Sep 21 '22 20:09

Leon Grapenthin


I don't know the answer to the question about the reader, but a safer way to construct this hash-map would be to ensure the keys are different. For example:

(let [[k1 k2] (shuffle (range 5))] 
  {k1 "hello" k2 "goodbye"})
like image 39
Michiel Borkent Avatar answered Sep 21 '22 20:09

Michiel Borkent