Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a data model to prevent common errors

There seem to be multiple ways to implement data models in Clojure:

  • ordinary built-in datatypes (maps/lists/sets/vectors)
  • built-in datatypes + meta-data -- for example: (type ^{:type ::mytype} {:fieldname 1})
  • built-in datatypes + special accessor functions (for instance, getting a non-existent key from a map throws an exception, instead of silently returning nil)
  • deftype
  • defstruct
  • defrecord
  • defprotocol

We've reached the point where maps/lists are no longer working well for us -- we run into lots of errors that pre-conditions/post-conditions could easily catch, but take a very long time to hunt down otherwise (and it's hard to write effective pre/post-conditions for functions that accept nested maps/lists/vectors) -- but we're not sure which of the above to choose from.

We have three major goals:

  • write idiomatic Clojure code
  • avoid spending large amounts of time hunting down stupid type errors
  • have confidence in our ability to change/refactor code with silently breaking anything

How can we harness the power of Clojure to help us?

like image 323
Matt Fenwick Avatar asked Oct 26 '11 16:10

Matt Fenwick


People also ask

What are the most common errors in data modeling?

Ignoring Small Data Sources One big mistake that is made is drawing conclusions based on an incomplete dataset that ignores these smaller data sources.

Does data models help reduce data entry errors by users?

A data model causes participants to crisply define concepts and resolve confusion. As a result, application development starts with a clear vision. Developers can still make detailed errors as they write application code, but they are less likely to make deep errors that are difficult to resolve.

What are the 5 data modeling techniques?

The following are the types of data modeling techniques: hierarchical, network, relational, object-oriented, entity-relationship, dimensional, and graph.

What are the 4 different types of data models?

The three primary data model types are relational, dimensional, and entity-relationship (E-R). There are also several others that are not in general use, including hierarchical, network, object-oriented, and multi-value.


1 Answers

Clojure culture is strongly supportive of the raw data types. Justifiably so. But explicit types can be useful. When your plain datatypes get sufficiently complex and specific, you essentially have an implicit dataype without the specification.

Rely on constructors. This sounds a bit dirty, in an OOP kind of way, but a constructor is nothing more than a function that creates your data type safely and conveniently. A drawback of plain data structures is that they encourage creating the data on the fly. So, instead of calling (myconstructor ...), I attempt to compose my data directly. And with much potential for error, as well as problems if I need to change the underlying data type.

Records are the sweet spot. With all the fuss about raw data types, it's easy to forget that records do a lot of things that maps can do. They can be accessed the same way. You can call seq on them. You can destructure them the same way. The vast majority of functions that expect a map will accept a record as well.

Meta data will not save you. My main objection to relying on meta data is that it isn't reflected in equality.

user> (= (with-meta [1 2 3] {:type :A})  (with-meta [1 2 3] {:type :B}))
true

Whether that's acceptable or not is up to you, but I'd worry about this introducing new subtle bugs.


The other dataype options:

  • deftype is only for low level work in creating new basic or special purpose data structures. Unlike defrecord, it doesn't bring all of the clojure goodness along with it. For most work, it isn't necessary or adviseable.
  • defstruct should be deprecated. When Rich Hickey introduced types and protocols, he essentially said that defrecord should be preferred evermore.

Protocols are very useful, even though they feel like a bit of a departure from the (functions + data) paradigm. If you find yourself creating records, you should consider defining protocols as well.

EDIT: I discovered another advantage to plain datatypes that hadn't been apparent to me earlier: if you're doing web programming, the plain datatypes can be converted to and from JSON efficiently and easily. (Libraries for doing this include clojure.data.json, clj-json, and my favourite, cheshire). With records and datatypes, the task is considerably more annoying.

like image 163
Rob Lachlan Avatar answered Sep 29 '22 19:09

Rob Lachlan