Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient cartesian product algorithm ignoring terms

Let's say I have sets A_1,...A_n, e.g. [[a b c][d e][f]]. I would like to find Cartesian product of these sets but not including any terms which are supersets of elements of some ignore list.

For instance if my ignore list is [[a e][c]], the result of the Cartesian product would be [[a d f][b d f][b e f]]. Note any term with c is not in there, neither is [a e f].

Of course one way I could do this is to find the full cartesian product and then remove the offending items, but I would like a more efficient way, such that I avoid checking solutions in the first place.

I have an initial solution which involves incrementally building each term in the cart-product, and at each stage I remove any elements from A_i if adding them to the term I am building would cause it to be a superset of any one of the ignores. This works fine, and is better than the naive solution, but there is still a large amount of redundant checking, which also depeneds on the order in which the sets are presented. E.g. if [f] was in my ignore list, I would still keep trying to create terms until I reach [f] and then discard.

For concreteness, my clojure implementation is

(defn first-elements
  "Get the first elements of a set of sets, unless ignored"
  [sets ignores in-ignore?]
  (loop [product-tuple [] sets sets]
    (println "sets " sets)
    (cond
      (or (nil? sets) (nil? (first sets)))
      product-tuple

      :else
      (if-let [set-op (remove #(in-ignore? product-tuple ignores %) (first sets))]
        (if (and (coll? set-op) (empty? set-op))
            product-tuple
            (recur (conj product-tuple (first set-op)) (next sets)))
        product-tuple))))

(defn in-ignore?
  "if I add elem to this build will it become a superset of any of the ignores"
  [build ignores elem]
  (some  #(clojure.set/superset? (conj (set build) elem) %) ignores))

(defn cartesian-product-ignore
  "All the ways to take one item from each sequence, except for ignore"
  [ignores original-sets]
  (loop [cart-prod #{} sets original-sets]
    (let [firsts (first-elements sets ignores in-ignore?)]
      (print "firsts " firsts "-cart-prod " cart-prod " sets " sets "\n")
      (cond
        (zero? (count firsts))
        cart-prod

        (= (count sets) (count firsts))
        (recur (conj cart-prod firsts) (update-in sets [(dec (count sets))] next))

        :else
        (recur cart-prod (assoc
                           (update-in sets [(dec (count firsts))] next)
                           (count firsts)
                           (original-sets (count firsts))))))))
like image 623
zenna Avatar asked Oct 14 '13 21:10

zenna


2 Answers

I think there are some improvements that can be made over your current approach. But first, let's implement a basic cartisian-product. Then we can adapt it to accept an ignores list. This is easy enough using for and some recursion:

(defn cartesian-product [colls]
  (if (empty? colls)
    (list ())
    (for [e (first colls)
          sub-product (cartesian-product (rest colls))]
      (cons e sub-product))))

;; Quick test run
(cartesian-product [[:a :b :c] [:d :e] [:f]])
=> ((:a :d :f) (:a :e :f) (:b :d :f) (:b :e :f) (:c :d :f) (:c :e :f))

Good. And since we're using for, we have the advantage of laziness. If you need your result to be something other than a sequence of sequences, it's easy enough to convert it to something else.

Now, the hard part -- implementing the ignore sets. According to your description, your current approach is to remove elements from A_i if adding them to the term you are building would cause that term to become a superset of any of the ignore sets. As your code illustrates, not only is this somewhat inefficient (for example, superset? is worst-case linear time w.r.t. the size of its first parameter), but it also makes the code more complicated than it needs to be.

So let's adopt a different approach. Instead of removing elements from A_i, let's remove any elements we add to a term from the ignore sets. Then we can prune a term if any of the ignore sets are empty. As a bonus, all it requires is a few changes to our previous cartesian-product implementation:

(defn cartesian-product-ignore [ignore-sets colls]
  (cond (some empty? ignore-sets) () ; prune
        (empty? colls) (list ())     ; base case
        :else                        ; recursive case
        (for [e (first colls)
              sub-product (cartesian-product-ignore (map (fn [s]
                                                           (disj s e))
                                                         ignore-sets)
                                                    (rest colls))]
          (cons e sub-product))))

;; test without any ignore sets
(cartesian-product-ignore [] [[:a :b :c] [:d :e] [:f]])
=> ((:a :d :f) (:a :e :f) (:b :d :f) (:b :e :f) (:c :d :f) (:c :e :f))

;; Now the moment of truth
(cartesian-product-ignore [(set [:a :e]) (set [:c])] [[:a :b :c] [:d :e] [:f]])
=> ((:a :d :f) (:b :d :f) (:b :e :f))

Of course, minor changes may be required to fit your exact needs. For example, you might want to accept ignore sets as a vector or sequence and convert them to sets internally. But that is the essence of the algorithm..

like image 127
Nathan Davis Avatar answered Sep 20 '22 12:09

Nathan Davis


Here a core.logic (naive) approach

(ns testing
  (:refer-clojure :exclude [==])
  (:use [clojure.core.logic])
)
(run* [q]
       (fresh [x y z]
              (membero x [:a :b :c])
              (membero y [:d :e])
              (membero z [:f])
              (== q [x y z])
              (!= q [:a :e z] )
              (!= q [:c y z] )

              )
       )

==> ([:a :d :f] [:b :d :f] [:b :e :f])

Although it's much more slow than @Nathan_Davis algorithm, 23.263 msecs vs 0.109 msecs

like image 27
tangrammer Avatar answered Sep 19 '22 12:09

tangrammer