Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data format safety in clojure

Tags:

clojure

Coming from a Java background, I'm quite fond of static type safety and wonder how clojure programmers deal with the problem of data format definitions (perhaps not just types but general invariants, because types are just a special case of that.)

This is similar to an existing question "Type Safety in Clojure", but that one focuses more on the aspect of how to check types at compile time, while I'm more interested in how the problem is pragmatically addressed.

As a practical example I'm considering an editor application which handles a particular document format. Each document consists of elements that come in several different varieties (graphics elements, font elements etc.) There would be editors for the different element types, and also of course functions to transform a document from/to a byte stream in its native on-disk format.

The basic problem I am interested in is that the editors and the read/write functions have to agree on a common data format. In Java, I would model the document's data as an object graph, e.g. with one class representing a document and one class for each element variety. This way, I get a compile-time guarantee about what the structure of my data looks like, and that the field "width" of a graphics element is an integer and not a float. It does not guarantee that width is positive - but using a getter/setter interface would allow the corresponding class to add invariant guarantees like that.

Being able to rely on this makes the code dealing with this data simpler, and format violations can be caught at compile-time or early at runtime (where some code attempts to modify data that would violate invariants).

How can you achieve a similar "data format reliability" in Clojure? As far as I know, there is no way to perform compile-time checking and hiding domain data behind a function interface seems to be discouraged as non-idiomatic (or maybe I misunderstand?), so what do Clojure developers do to feel safe about the format of data handed into their functions? How do you get your code to error out as quickly as possible, and not after the user edited for 20 more minutes and tries to save to disk, when the save function notices that there is a graphics element in the list of fonts due to an editor bug?

Please note that I'm interested in Clojure and learning, but didn't write any actual software with it yet, so it's possible that I'm just confused and the answer is very simple - if so, sorry for wasting your time :).

like image 951
Medo42 Avatar asked Feb 11 '12 17:02

Medo42


2 Answers

I don't see anything wrong or unidiomatic about using a validating API to construct and manipulate your data as in the following.

(defn text-box [text height width]
  {:pre [(string? text) (integer? height) (integer? width)]}
  {:type 'text-box :text text :height height :width width})

(defn colorize [thing color]
  {:pre [(valid-color? color)]}
  (assoc thing :color color))

... (colorize (text-box "Hi!" 20 30) :green) ...

In addition, references (vars, refs, atoms, agents) can have an associated validator function that can be used to ensure a valid state at all times.

like image 172
Matthias Benkard Avatar answered Nov 03 '22 10:11

Matthias Benkard


Good question - I also find that moving from a statically typed language to a dynamic one requires a bit more care about type safety. Fortunately TDD techniques help a huge amount here.

I typically write a "validate" function which checks all your assumptions about the data structure. I often do this in Java too for invariant assumptions, but in Clojure it's more important because you need to check thinks like types as well.

You can then use the validate function in several ways:

  • As a quick check at the REPL: (validate foo)
  • In unit tests: (is (validate (new-foo-from-template a b c)))
  • As a run-time check for key functions, e.g. checking that (read-foo some-foo-input-stream) is valid

If you have a complex data structure which is a tree of multiple different component types, you can write a validate function for each component type and have the validate function for the whole document call validate for each sub-component recursively. A nice trick is to use either protocols or multimethods to make the validate function polymorphic for each component type.

like image 30
mikera Avatar answered Nov 03 '22 09:11

mikera