Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pre-aggregated datastructure in clojure

In OLAP-cubes it's possible to do very quick look ups on large amounts of aggregated data. The major reason for this is that one pre-aggregates data in operations which are easy to combine upwards (mainly +, -, mean, std, max, min and some more).

How to get this "anti-lazy" behaviour in clojure?

I'm thinking on something like

(def world-population {:africa 4e8            ;;this is an aggregation!
                       :africa/liberia 3.4e6
                       :africa/ethiopia 7.4e7
                       ...})

How to update a datastructure like this and make sure the parents of an entity is updated too? Do one have to roll one's own ref-implementation?

like image 454
claj Avatar asked Mar 12 '12 15:03

claj


2 Answers

By storing your data in an atom, you can add watches - essentially callbacks when the atom is updated

Something like this:

(def world-population (atom {:africa 4e8
                             :africa/liberia 3.4e6
                             ...}))

(add-watch word-population :population-change-key
      (fn [key ref old new]
         (prn "population change")))

You could build some event propagation logic on top of that.

like image 116
sw1nn Avatar answered Nov 20 '22 00:11

sw1nn


You could write a recursive rollup function as a higher order function, something like:

(defn rollup 
  ([data heirarchy func]
    (loop [top (second (first heirarchy))]
      (if (nil? (heirarchy top))
        (rollup data heirarchy func top)
        (recur (heirarchy top)))))
  ([data heirarchy func root]
    (let [children (reduce (fn [l [k v]] (if (= v root) (cons k l) l)) '() heirarchy)
          data (reduce (fn [d c] (if (d c) d (rollup d heirarchy func c))) data children)
          child-values (map data children)]
      (assoc data root (apply func child-values)))))

Which can then be used with any particular rollup operation or hierarchy you like:

(def populations { :africa/liberia 3.4e6
                   :africa/ethiopia 7.4e7})

(def geography {:africa/liberia :africa 
                :africa/ethiopia :africa
                :africa :world})

(rollup populations geography +)
=> {:africa           7.74E7, 
    :world            7.74E7, 
    :africa/ethiopia  7.4E7, 
    :africa/liberia   3400000.0}

Obviously it gets more complicated if you have very large data sets or multiple hierarchies etc., but this should be enough for many simple cases.

like image 21
mikera Avatar answered Nov 20 '22 01:11

mikera