Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure & ClojureScript: clojure.core/read-string, clojure.edn/read-string and cljs.reader/read-string

I am not clear about the relationship between all these read-string functions. Well, it is clear that clojure.core/read-string can read any serialized string that is output by pr[n] or even print-dup. It is also clear that clojure.edn/read-string does read strings that are formatted according to the EDN specification.

However, I am starting with Clojure Script, and it is not clear if cljs.reader/read-string comply with. This question has been triggered by the fact that I had a web service that was emiting clojure code serialized that way:

(with-out-str (binding [*print-dup* true] (prn tags))) 

That was producing the object serialization which includes the datatypes. However, this was not readable by cljs.reader/read-string. I was always getting error of this type:

Could not find tag parser for = in ("inst" "uuid" "queue" "js")  Format should have been EDN (default) 

At first, I thought that this error was thrown by cljs-ajax but after testing the cljs.reader/read-string in a rhino REPL, I got the same error, which means it is thrown by cljs.reader/read-string itself. It is thrown by the maybe-read-tagged-type function in cljs.reader but it is not clear if this is because the reader only works with EDN data, or if...?

Also, from the Differences from Clojure document, the only thing that is said is:

The read and read-string functions are located in the cljs.reader namespace 

Which suggests that they should exactly have the same behavior.

like image 960
Neoasimov Avatar asked Jul 09 '14 18:07

Neoasimov


People also ask

What is Clojure is used for?

Clojure is designed to be a hosted language, sharing the JVM type system, GC, threads etc. All functions are compiled to JVM bytecode. Clojure is a great Java library consumer, offering the dot-target-member notation for calls to Java. Clojure supports the dynamic implementation of Java interfaces and classes.

Is Clojure better than Java?

Clojure and Java can be categorized as "Languages" tools. "It is a lisp", "Concise syntax" and "Persistent data structures" are the key factors why developers consider Clojure; whereas "Great libraries", "Widely used" and "Excellent tooling" are the primary reasons why Java is favored.

Is Clojure still used?

Clojure (still) for Start-ups While more large companies are adopting Clojure than ever, the sweet spot is still the smaller companies of less than 100 employees. The reasons that start-ups choose Clojure are many and variegated: Leverage - small effort, big result.

Why is Clojure not popular?

Clojure may not find much popularity since it lacks sufficient Clojure-based libraries and frameworks, compared to other applications worth consideration. The lisp syntax of Clojure is also more difficult to read, and therefore could stir being unfamiliar with the tool.


2 Answers

Summary: Clojure is a superset of EDN. By default, pr, prn and pr-str, when given Clojure data structures, produce valid EDN. *print-dup* changes that and makes them use the full power of Clojure to give stronger guarantees about the "sameness" of the objects in memory after a round-trip. ClojureScript can only read EDN, not full Clojure.

Easy solution: do not set *print-dup* to true, and only pass pure data from Clojure to ClojureScript.

Harder solution: use tagged literals, with a (possibly shared) associated reader on both sides. (This will still not involve *print-dup*, though.)

Tangentially related: most use-cases for EDN are covered by Transit, which is faster, especially on the ClojureScript side.


Let's start with the Clojure part. Clojure had, from the start, a clojure.core/read-string function, which reads a string in the old Lispy sense of the Read-Eval-Print-Loop, i.e. it gives access to the actual reader used in the compilation of Clojure.[0]

Later on, Rich Hickey & co decided to promote the data notation of Clojure and published the EDN spec. EDN is a subset of Clojure; it is limited to the data elements of the Clojure language.

As Clojure is a Lisp and, like all lisps, touts the "code is data is code" philosophy, the actual implications of the above paragraph may not be completely clear. I am not sure there is a detailed diff anywhere, but a careful examination of the Clojure Reader description and the previously mentioned EDN spec shows a few differences. The most obvious differences are around macro characters and in particular the # dispatch symbol, which has many more targets in Clojure than in EDN. For example, the #(* % %) notation is valid Clojure, which the Clojure reader will turn into the equivalent of the following EDN: (fn [x] (* x x)). Of particular importance for this question is the scarcely documented #= special reader macro, which can be used to execute arbitrary code right inside the reader.

As the complete language is available to the Clojure reader, it is possible to embed code into the character string that the reader is reading and have it evaluated right then and there in the reader. A few examples can be found here.

The clojure.edn/read-string function is strictly limited to the EDN format, not the whole Clojure language. In particular, its operation is not influenced by the *read-eval* variable and it cannot read all of the valid Clojure code fragments possible.

It turns out that the Clojure reader is, for mostly historical reasons, written in Java. As it is a significant piece of software, works well, and has been largely debugged and battle-tested by a few years of active Clojure usage in the wild, Rich Hickey decided to reuse it in the ClojureScript compiler (this is the main reason why the ClojureScript compiler runs on the JVM). The ClojureScript compilation process happens mostly on the JVM, where the Clojure reader is available, and thus ClojureScript code is parsed by the clojure.core/read-string (or rather its close cousin clojure.core/read) function.

But your web application does not have access to a running JVM. Requiring a Java applet for ClojureScript applications did not look like a very promising idea, especially as the main objective of ClojureScript was to extend the reach of the Clojure language beyond the confines of the JVM (and the CLR). So the decision was taken that ClojureScript would not have access to its own reader, and consequently would not have access to its own compiler either (i.e. there is no eval nor read nor read-string in ClojureScript). This decision and its implications are discussed in greater details here, by someone who actually knows how things happened (I was not there, so there may be some inaccuracies in the historical perspective of this explanation).

So ClojureScript has no equivalent of clojure.core/read-string (and some would argue that it is therefore not a true lisp). Still, it would be nice to have some way to communicate Clojure data structures between a Clojure server and a ClojureScript client, and indeed that was one of the motivating factors in the EDN effort. Just as Clojure got a restricted (and safer) reading function (clojure.edn/read-string) after the publication of the EDN spec, ClojureScript also got an EDN reader in the standard distribution as cljs.reader/read-string. It may be argued that a little more consistency between the names of these two functions (or rather their namespace) would have been good.

Before we can finally answer your original question, we need one more little piece of context regarding *print-dup*. Remember that *print-dup* was part of Clojure 1.0, which means it predates EDN, the notion of tagged literals, and records. I would argue that EDN and tagged literals offer a better alternative for most of the use-cases of *print-dup*. As Clojure is generally built on top of a few data abstractions (list, vector, set, map, and the usual scalars), the default behaviour of the print/read cycle is to preserve the abstract shape of the data (a map is a map), but not especially its concrete type. For example, Clojure has multiple implementations of the map abstraction, such as PersistentArrayMap for small maps and PersistentHashMap for bigger one. The default behaviour of the language assumes that you do not care about the concrete type.

For the rare cases where you do, or for the more specialized types (defined with deftype or defstruct, at the time), you might want more control about how these are read, and that is what print-dup is for.

The point is, with *print-dup* set to true, pr and family will not produce valid EDN, but actually Clojure data including some explicit #=(eval build-my-special-type) forms, which are not valid EDN.

[0]: In "lisps", the compiler is explicitly defined in terms of data structures, rather than in terms of character strings. While that may seem like a small difference with usual compilers (which do indeed transform the character stream into data structures during their processing), the defining characteristic of Lisp is that the data structures that are emitted by the reader are the data structures commonly used in the language. In other words, the compiler is basically just a function available at all times in the language. This is not as unique as it used to be, as most dynamic languages support some form of eval; what is unique to Lisp is that eval takes a data structure, not a character string, which makes dynamic code generation and evaluation much easier. One important implication of the compiler being "just another function" is that the compiler actually runs with the whole language already defined and available, and all of the code read so far also available, which opens up the door to the Lisp macro system.

like image 133
Gary Verhaegen Avatar answered Sep 23 '22 20:09

Gary Verhaegen


cljs.reader/read only supports EDN, but pr etc. will output tags (in particular, for protocols and records) which won't read.

In general, if on the Clojure side you can verify that (= value (clojure.edn/read-string (pr-str value))), your cljs interop should work. This can be limiting, and there is some discussion of workarounds or fixes to the EDN library.

Depending on what your data looks like, you might take a look at the tagged library as described in the Clojure Cookbook.

like image 39
Michael Victor Zink Avatar answered Sep 22 '22 20:09

Michael Victor Zink