Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Datomic: schema for a to-many relationship with a 'reset' operation

I'm looking for feedback on an approach for modeling certain to-many relationships in Datomic.

The problem

Suppose I want to design the Datomic schema for a domain where a Person has a list of favorite Movies. For instance, John's favorite movies are Gladiator, Star Wars, and Fight Club.

The most obvious schema for modeling this in Datomic is with a cardinality-many attribute, e.g:

#{["John" :person/favorite-movies "Gladiator"]
  ["John" :person/favorite-movies "Star Wars"]
  ["John" :person/favorite-movies "Fight Club"]}

This approach makes it easy to add or remove movies from the list (simply use :db/add and :db/retract), but I find it impractical for resetting the whole list of movies - you essentially need to compute a diff between the old list and the new, and that has to run in a transaction function. This gets even worse when the elements of the list are not scalars.

Alternative approach

As an alternative approach, I'm considering introducing an indirection using a set entity:

#{["John" :person/favorite-movies 42]
  [42 :set.string/contains "Gladiator"]
  [42 :set.string/contains "Star Wars"]
  [42 :set.string/contains "Fight Club"]}

With this approach, :person/favorite-movies is a cardinality-one, ref-typed attribute, and :set.string/contains is cardinality-many, string-typed attribute. Resetting the list is then simply a matter of creating a new set entity:

[{:db/id "John"
  :person/favorite-movies {:db/id (d/tempid :db.part/user)
                           :set.string/contains ["Gladiator" 
                                                 "The Lord of the Rings"
                                                 "A Clockwork Orange"
                                                 "True Romance"]}}]

Are there known limitations to this approach of modeling to-many relationships?


Edit: A less trivial use case

It's more relevant to study this problem in a case where the relationship is ref-typed, not scalar-typed, because some issues on appear with ref-typed attributes in Datomic.

It's also more relevant to study a use case where a 'reset' operation for the relationship makes more sense, which is not really the case for 'favorite movies'.

Example: A form with checkboxes, in which a user may provide an Answer to a Question by selecting a set of Options. The user may update her Answer to the Question. The goal is to model the Answer - Option relationship.

A canonical Datomic schema for this information model would be:

  • :answer/id: unique id of the answer (scalar-typed, unique-identity)
  • :option/id: unique id of the option (scalar-typed, unique-identity)
  • :answer/selectedOptions (ref-typed, cardinality-many)
like image 891
Valentin Waeselynck Avatar asked Feb 08 '17 12:02

Valentin Waeselynck


2 Answers

  • This technique is more complicated: you need to manage two entities instead of one.
  • You no longer have a useful index on favorite-movies values if you use a generic attr to hold the set members (:set.string/contains in your example). To get useful indexes back, you would need a pair of attributes: :person/favorite-movies and :person.favorite-movies/items for example.
  • Your history of changes to a user's favorite movies is more complicated to reconstruct. You can now longer simply look at :person/favorite-movies, you need to know what set entity it points to at any moment, and look at the history of the set entity.
  • Your application needs to distinguish between "I am resetting a set" vs "I am changing a set and want the changes merged." There may not actually be any such distinction in the application model.
  • You can end up with orphaned "set" entities with unreferenced data on them. For example: at the same time, one peer sends a reset (i.e. asserts a new set entity) and another peer adds an item to the existing set. If the second peer's transaction comes after the first, you now have an orphaned datom.

The best solution is to make granular changes. E.g., if users add or remove a specific item from the set, each add or remove should be a transaction with just that assertion or retraction. Set operations are commutative, so two users bashing on the same set will not cause any harm. (Unless you have derived data, in which case race conditions matter.)

If you really need the "reset the set, make it look like this" operation, a better solution is to use a transaction function that receives the entire set value you desire and computes the adds and retracts necessary to get the current value to be the new value you want. Here is a tx function that will do that:

{:db/ident :db.fn/resetAttribute
 :db/doc   "Unconditionally set an entity's attribute's values to those provided,
retracting all other existing values.

Values must be a collection (list, seq, vector), even for cardinality-one
attributes. An empty collection (or nil) will retract all values. The values
themselves must be primitive, i.e. no map forms are permitted for refs, use
tempids directly. If the attribute is-component, removed values will be
:db.fn/retractEntity-ed."
 :db/fn
 #db/fn {:lang   "clojure"
         :params [db ent attr values]
         :code   (let [eid       (datomic.api/entid db ent)
                       aid       (datomic.api/entid db attr)
                       {:keys [value-type is-component]} (datomic.api/attribute db aid)
                       newvalues (if (= value-type :db.type/ref)
                                   (into #{} (map #(if (string? %) % (d/entid db %))) values)
                                   (into #{} values))
                       oldvalues (into #{} (map :v) (datomic.api/datoms db :eavt eid aid))]
                   (-> []
                       (into (comp
                               (remove newvalues)
                               (map (if is-component
                                      #(do [:db.fn/retractEntity %])
                                      #(do [:db/retract eid aid %]))))
                         oldvalues)
                       (into (comp
                               (remove oldvalues)
                               (map #(do [:db/add eid aid %])))


                    newvalues)))}}

You would use it like this:

[:db.fn/resetAttribute [:person/id "John"] :person/favorite-movies
  ["Gladiator" "The Lord of the Rings" "A Clockwork Orange" "True Romance"]]]

;; Or to retract *all* existing values:
[:db.fn/resetAttribute [:person/id "John"] :person/favorite-movies nil]
like image 180
Francis Avila Avatar answered Nov 20 '22 10:11

Francis Avila


Having experimented for a few months with this approach, here are my conclusions.

Both strategies (A - using a direct attribute vs B - using an intermediary, disposable entity) have practical advantages and drawbacks when it comes to reading and writing, as can be read in the question and Francis Avila's answer. But IMHO, the most important principle is this: the schema should be primarily determined by the domain model, not by the read and write patterns.

Are there domain models for which strategy B is appropriate? I believe so.

For instance, in the Question/Option/Answer example domain presented in the question, it may make more sense for the set of answers to be interpreted a cohesive whole rather than separate individual facts. Add a :submittedTime instant-typed attribute to the intermediary entity, and you've now modeled a revision of the answer (you don't want to rely on Datomic history to model that).


Note:

With Strategy A, implementing a 'reset' operation requires a transaction function; because of tricky concerns related to entity lifecycle ('does this entity already exist or not'), such a transaction function is not trivial to write in the most general case. My best shot at this can be found in the Datofu library.

like image 20
Valentin Waeselynck Avatar answered Nov 20 '22 08:11

Valentin Waeselynck