Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a algorithm to extract the minimum number of Cartesian products from a set of formulas?

For example, we have a set of formulas as below:

B*2*j
B*3*i
B*3*j
C*2*j
C*3*i
C*3*j
D*2*i
D*2*j
D*3*i
D*3*j

And we could have three Cartesian products to represent the formulas above:

D*(2+3)*(i+j)
(B+c)*3*(i+j)
(B+C)*2*j

So the total number is 3. And we could also have:

3*(B+C+D)*(i+j)
2*(B+C)*D
2*D*(i+j)

which is also 3.

I wanna ask that is there a algorithm to determine the minimum number of Cartesian products from a set of formulas? And also come up with these products?

like image 950
injoy Avatar asked Oct 21 '22 07:10

injoy


2 Answers

First, I'll write a set of formulas as terms separated by +, since the transformation you're looking for makes sense algebraically (apart from the fact that you don't want to combine numbers like 2+3 into 5).

The basic operation that you have available is factorising: combining two terms like ABC+ABD into AB(C+D). Based on your comment, you can only generate new factors that consist of a sum of single-factor terms, like C+D in the previous example; you're not allowed to factorise e.g. ABCD+ABDE into AB(CD+DE).

You can factorise 2 k-factor terms if and only if they share exactly k-1 factors. (E.g. k=3 in my ABC+ABD example.) Every such factorisation reduces the number of terms in the set by 1: 2 are removed and 1 is added back in.

Doing this multiple times works when combining 3 or more terms: ABC+ABD+ABE can first be factorised into AB(C+D)+ABE and then those 2 terms factorised again into AB(C+D+E). Notice that it doesn't matter in which order we list terms in a sum or factors in a product, and nor does it matter in which order we perform factorisation steps when building a factor containing 3 or more terms.

We can then frame the problem as a search problem in a graph, in which the start vertex corresponds to the original formula (B*2*j + B*3*i + ... + D*3*j in your example) and from each vertex v there emanate arcs to its child vertices, which each correspond to the result of performing some factorisation on v. v will have a child vertex for each possible factorisation that could be performed on it; if there are m terms in v, then this means it could have up to m(m-1)/2 children in the worst case, because it could be that all m terms share a full complement of k-1 factors, meaning that any pair of them could be combined.

If a vertex has no pair of terms that can be combined via factorisation then it is a "leaf" -- it has no children, and can't be processed further. What we want to find is a leaf vertex that has the fewest number of terms. Since every factorisation, corresponding to an arc in the graph, reduces the number of terms by 1, this is equivalent to searching for a deepest-possible vertex. This can be done using DFS or BFS. Note however that the same expression (vertex) can be generated many times over using this approach, so it will be crucial for performance to maintain a hashtable seen that records all expressions that have already been processed; then if we visit a vertex, try to generate a child for it, and see that this child is already in seen, we avoid visiting this child a second time.

To mitigate against the phenomenon of the same expression being generated via multiple different orderings of the same set of factorisations, you can add a rule: order v's child factorisations somehow, so that if there are n children they correspond to factorisations 1, 2, ..., n in this ordering, and record in a separate "already skipped" field in each child vertex the set of earlier (in the ordering) factorisations that were skipped over to generate this child. Then, when visiting a vertex, avoid generating any of its "already skipped" factorisations as children, since doing so would create a vertex that is identical to some other existing vertex (by performing the same pair of operations in reverse order).

There are probably other speedups available that will reduce the number of duplicate vertices that are generated in the first place, but this should be enough to get results for small problems.

like image 107
j_random_hacker Avatar answered Nov 02 '22 04:11

j_random_hacker


Write down you sum in matrix form. Then what you are asking for is the rank of that matrix, and a corresponding decomposition into dyadic products. This decomposition is far from unique.

            [ 3  5 ]   [ i ] 
[ B C D ] * | 3  5 | * [ j ]
            [ 5  5 ]   

As one can see, the matrix in the middle has full rank 2


If you intend to use 2 and 3 also as variables, then you are asking to decompose a tensor of order 3 into a minimum number of terms that factorize, i.e., that are tensor products of vectors.

like image 20
Lutz Lehmann Avatar answered Nov 02 '22 04:11

Lutz Lehmann