Suppose I have a list like:
(def data [:a :b :c :d :e :f :g :h :b :d :x])
and predicates like:
(defn start? [x] (= x :b))
(defn stop? [x] (= x :d))
that mark the first & last elements of a sub-sequence. I want to return a list with subgroups like so:
(parse data) => [:a [:b :c :d] :e :f :g :h [:b :d] :x]
How can I use Clojure to accomplish this task?
You could use a custom stateful transducer:
(defn subgroups [start? stop?]
(let [subgroup (volatile! nil)]
(fn [rf]
(fn
([] (rf))
([result] (rf result))
([result item]
(let [sg @subgroup]
(cond
(and (seq sg) (stop? item))
(do (vreset! subgroup nil)
(rf result (conj sg item)))
(seq sg)
(do (vswap! subgroup conj item)
result)
(start? item)
(do (vreset! subgroup [item])
result)
:else (rf result item))))))))
(into []
(subgroups #{:b} #{:d})
[:a :b :c :d :e :f :g :h :b :d :x])
; => [:a [:b :c :d] :e :f :g :h [:b :d] :x]
I like the stateful transducer answer, but noticed the question doesn't say what the behavior should be if a start element is found but no stop element is found. If a subgroup is left open the transducer will truncate the input sequence, which might be unexpected/undesirable. Consider the example with stop elements removed:
(into [] (subgroups #{:b} #{:d}) [:a :b :c :e :f :g :h :b :x])
=> [:a] ;; drops inputs from before (last) subgroup opens
Transducers have a completing arity that could be used to flush any open subgroup in this case:
Completion (arity 1) - some processes will not end, but for those that do (like transduce), the completion arity is used to produce a final value and/or flush state. This arity must call the xf completion arity exactly once.
The only difference in this example and the original transducer example is the completing arity:
(defn subgroups-all [start? stop?]
(let [subgroup (volatile! nil)]
(fn [rf]
(fn
([] (rf))
([result] ;; completing arity flushes open subgroup
(let [sg @subgroup]
(if (seq sg)
(do (vreset! subgroup nil)
(rf result sg))
(rf result))))
([result item]
(let [sg @subgroup]
(cond
(and (seq sg) (stop? item))
(do (vreset! subgroup nil)
(rf result (conj sg item)))
(seq sg)
(do (vswap! subgroup conj item)
result)
(start? item)
(do (vreset! subgroup [item])
result)
:else (rf result item))))))))
Then dangling, open groups will be flushed:
(into [] (subgroups-all #{:b} #{:d}) [:a :b :c :d :e :f :g :h :b :x])
=> [:a [:b :c :d] :e :f :g :h [:b :x]]
(into [] (subgroups-all #{:b} #{:d}) [:a :b :c :e :f :g :h :b :x])
=> [:a [:b :c :e :f :g :h :b :x]]
Notice in the last example that nested starts/opens don't result in nested groupings, which got me thinking about another solution...
When I thought about this more generally as "unflattening" a sequence, zippers came to mind:
(defn unflatten [open? close? coll]
(when (seq coll)
(z/root
(reduce
(fn [loc elem]
(cond
(open? elem)
(-> loc (z/append-child (list elem)) z/down z/rightmost)
(and (close? elem) (z/up loc))
(-> loc (z/append-child elem) z/up)
:else (z/append-child loc elem)))
(z/seq-zip ())
coll))))
This creates a zipper on an empty list and builds it up using reduce
over the input sequence. It takes a pair of predicates for opening/closing groups and allows for arbitrarily nested groups:
(unflatten #{:b} #{:d} [:a :b :c :b :d :d :e :f])
=> (:a (:b :c (:b :d) :d) :e :f)
(unflatten #{:b} #{:d} [:a :b :c :b :d :b :b :d :e :f])
=> (:a (:b :c (:b :d) (:b (:b :d) :e :f)))
(unflatten #{:b} #{:d} [:b :c :e :f])
=> ((:b :c :e :f))
(unflatten #{:b} #{:d} [:d :c :e :f])
=> (:d :c :e :f)
(unflatten #{:b} #{:d} [:c :d])
=> (:c :d)
(unflatten #{:b} #{:d} [:c :d :b])
=> (:c :d (:b))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With