How can I group consecutive elements of list using start/stop predicates?

Question

Suppose I have a list like:

(def data [:a :b :c :d :e :f :g :h :b :d :x])

and predicates like:

(defn start? [x] (= x :b))
(defn stop?  [x] (= x :d))

that mark the first & last elements of a sub-sequence. I want to return a list with subgroups like so:

(parse data) => [:a [:b :c :d] :e :f :g :h [:b :d] :x]

How can I use Clojure to accomplish this task?

exupero · Accepted Answer

You could use a custom stateful transducer:

(defn subgroups [start? stop?]
  (let [subgroup (volatile! nil)]
    (fn [rf]
      (fn
        ([] (rf))
        ([result] (rf result))
        ([result item]
         (let [sg @subgroup]
           (cond
             (and (seq sg) (stop? item))
             (do (vreset! subgroup nil)
               (rf result (conj sg item)))
             (seq sg)
             (do (vswap! subgroup conj item)
               result)
             (start? item)
             (do (vreset! subgroup [item])
               result)
             :else (rf result item))))))))

(into []
      (subgroups #{:b} #{:d})
      [:a :b :c :d :e :f :g :h :b :d :x])
; => [:a [:b :c :d] :e :f :g :h [:b :d] :x]

Taylor Wood · Answer

I like the stateful transducer answer, but noticed the question doesn't say what the behavior should be if a start element is found but no stop element is found. If a subgroup is left open the transducer will truncate the input sequence, which might be unexpected/undesirable. Consider the example with stop elements removed:

(into [] (subgroups #{:b} #{:d}) [:a :b :c :e :f :g :h :b :x])
=> [:a] ;; drops inputs from before (last) subgroup opens

Transducers have a completing arity that could be used to flush any open subgroup in this case:

Completion (arity 1) - some processes will not end, but for those that do (like transduce), the completion arity is used to produce a final value and/or flush state. This arity must call the xf completion arity exactly once.

The only difference in this example and the original transducer example is the completing arity:

(defn subgroups-all [start? stop?]
  (let [subgroup (volatile! nil)]
    (fn [rf]
      (fn
        ([] (rf))
        ([result] ;; completing arity flushes open subgroup
         (let [sg @subgroup]
           (if (seq sg)
             (do (vreset! subgroup nil)
                 (rf result sg))
             (rf result))))
        ([result item]
         (let [sg @subgroup]
           (cond
             (and (seq sg) (stop? item))
             (do (vreset! subgroup nil)
                 (rf result (conj sg item)))
             (seq sg)
             (do (vswap! subgroup conj item)
                 result)
             (start? item)
             (do (vreset! subgroup [item])
                 result)
             :else (rf result item))))))))

Then dangling, open groups will be flushed:

(into [] (subgroups-all #{:b} #{:d}) [:a :b :c :d :e :f :g :h :b :x])
=> [:a [:b :c :d] :e :f :g :h [:b :x]]
(into [] (subgroups-all #{:b} #{:d}) [:a :b :c :e :f :g :h :b :x])
=> [:a [:b :c :e :f :g :h :b :x]]

Notice in the last example that nested starts/opens don't result in nested groupings, which got me thinking about another solution...

Nested groups and zippers

When I thought about this more generally as "unflattening" a sequence, zippers came to mind:

(defn unflatten [open? close? coll]
  (when (seq coll)
    (z/root
     (reduce
      (fn [loc elem]
        (cond
          (open? elem)
          (-> loc (z/append-child (list elem)) z/down z/rightmost)
          (and (close? elem) (z/up loc))
          (-> loc (z/append-child elem) z/up)
          :else (z/append-child loc elem)))
      (z/seq-zip ())
      coll))))

This creates a zipper on an empty list and builds it up using reduce over the input sequence. It takes a pair of predicates for opening/closing groups and allows for arbitrarily nested groups:

(unflatten #{:b} #{:d} [:a :b :c :b :d :d :e :f])
=> (:a (:b :c (:b :d) :d) :e :f)
(unflatten #{:b} #{:d} [:a :b :c :b :d :b :b :d :e :f])
=> (:a (:b :c (:b :d) (:b (:b :d) :e :f)))
(unflatten #{:b} #{:d} [:b :c :e :f])
=> ((:b :c :e :f))
(unflatten #{:b} #{:d} [:d :c :e :f])
=> (:d :c :e :f)
(unflatten #{:b} #{:d} [:c :d])
=> (:c :d)
(unflatten #{:b} #{:d} [:c :d :b])
=> (:c :d (:b))

How can I group consecutive elements of list using start/stop predicates?

Tags:

clojure

Alan Thompson

2 Answers

exupero

Nested groups and zippers

Taylor Wood

Recent Activity

Donate For Us

How can I group consecutive elements of list using start/stop predicates?

Tags:

clojure

Alan Thompson

2 Answers

exupero

Nested groups and zippers

Taylor Wood

Related questions

Recent Activity

Donate For Us