Optimal merging of triplets

Tags:

I'm trying to come up with an algorithm for the following problem :

I've got a collection of triplets of integers - let's call these integers A, B, C. The value stored inside can be big, so generally it's impossible to create an array of size A, B, or C. The goal is to minimize the size of the collection. To do this, we're provided a simple rule that allows us to merge the triplets :

For two triplets (A, B, C) and (A', B', C'), remove the original triplets and place the triplet (A | A', B, C) if B == B' and C = C', where | is bitwise OR. Similar rules hold for B and C also.

In other words, if two values of two triplets are equal, remove these two triplets, bitwise OR the third values and place the result to the collection.

The greedy approach is usually misleading in similar cases and so it is for this problem, but I can't find a simple counterexample that'd lead to a correct solution. For a list with 250 items where the correct solution is 14, the average size computed by greedy merging is about 30 (varies from 20 to 70). The sub-optimal overhead gets bigger as the list size increases.

I've also tried playing around with set bit counts, but I've found no meaningful results. Just the obvious fact that if the records are unique (which is safe to assume), the set bit count always increases.

Here's the stupid greedy implementation (it's just a conceptual thing, please don't regard the code style) :

Click to copy

public class Record {
    long A;
    long B;
    long C;

    public static void main(String[] args) {
        List<Record> data = new ArrayList<>();
        // Fill it with some data

        boolean found;

        do {
            found = false;
            outer:
            for (int i = 0; i < data.size(); ++i) {
                for (int j = i+1; j < data.size(); ++j) {
                    try {
                        Record r = merge(data.get(i), data.get(j));
                        found = true;
                        data.remove(j);
                        data.remove(i);
                        data.add(r);
                        break outer;
                    } catch (IllegalArgumentException ignored) {
                    }
                }
            }
        } while (found);
    }

    public static Record merge(Record r1, Record r2) {
        if (r1.A == r2.A && r1.B == r2.B) {
            Record r = new Record();
            r.A = r1.A;
            r.B = r1.B;
            r.C = r1.C | r2.C;
            return r;
        }
        if (r1.A == r2.A && r1.C == r2.C) {
            Record r = new Record();
            r.A = r1.A;
            r.B = r1.B | r2.B;
            r.C = r1.C;
            return r;
        }
        if (r1.B == r2.B && r1.C == r2.C) {
            Record r = new Record();
            r.A = r1.A | r2.A;
            r.B = r1.B;
            r.C = r1.C;
            return r;
        }
        throw new IllegalArgumentException("Unable to merge these two records!");
    }

Do you have any idea how to solve this problem?

282

asked Mar 06 '14 11:03

Danstahr

1 Answers

This is going to be a very long answer, sadly without an optimal solution (sorry). It is however a serious attempt at applying greedy problem solving to your problem, so it may be useful in principle. I didn't implement the last approach discussed, perhaps that approach can yield the optimal solution -- I can't guarantee that though.

Level 0: Not really greedy

By definition, a greedy algorithm has a heuristic for choosing the next step in a way that is locally optimal, i.e. optimal right now, hoping to reach the global optimum which may or may not be possible always.

Your algorithm chooses any mergable pair and merges them and then moves on. It does no evaluation of what this merge implies and whether there is a better local solution. Because of this I wouldn't call your approach greedy at all. It is just a solution, an approach. I will call it the blind algorithm just so that I can succinctly refer to it in my answer. I will also use a slightly modified version of your algorithm, which, instead of removing two triplets and appending the merged triplet, removes only the second triplet and replaces the first one with the merged one. The order of the resulting triplets is different and thus the final result possibly too. Let me run this modified algorithm over a representative data set, marking to-be-merged triplets with a *:

Click to copy

0: 3 2 3   3 2 3   3 2 3
1: 0 1 0*  0 1 2   0 1 2
2: 1 2 0   1 2 0*  1 2 1
3: 0 1 2*
4: 1 2 1   1 2 1*
5: 0 2 0   0 2 0   0 2 0

Result: 4

Level 1: Greedy

To have a greedy algorithm, you need to formulate the merging decision in a way that allows for comparison of options, when multiple are available. For me, the intuitive formulation of the merging decision was:

If I merge these two triplets, will the resulting set have the maximum possible number of mergable triplets, when compared to the result of merging any other two triplets from the current set?

I repeat, this is intuitive for me. I have no proof that this leads to the globally optimal solution, not even that it will lead to a better-or-equal solution than the blind algorithm -- but it fits the definition of greedy (and is very easy to implement). Let's try it on the above data set, showing between each step, the possible merges (by indicating the indices of triplet pairs) and resulting number of mergables for each possible merge:

Click to copy

          mergables
0: 3 2 3  (1,3)->2
1: 0 1 0  (1,5)->1
2: 1 2 0  (2,4)->2
3: 0 1 2  (2,5)->2
4: 1 2 1
5: 0 2 0

Any choice except merging triplets 1 and 5 is fine, if we take the first pair, we get the same interim set as with the blind algorithm (I will this time collapse indices to remove gaps):

Click to copy

          mergables
0: 3 2 3  (2,3)->0
1: 0 1 2  (2,4)->1
2: 1 2 0
3: 1 2 1
4: 0 2 0

This is where this algorithm gets it differently: it chooses the triplets 2 and 4 because there is still one merge possible after merging them in contrast to the choice made by the blind algorithm:

Click to copy

          mergables
0: 3 2 3  (2,3)->0   3 2 3
1: 0 1 2             0 1 2
2: 1 2 0             1 2 1
3: 1 2 1

Result: 3

Level 2: Very greedy

Now, a second step from this intuitive heuristic is to look ahead one merge further and to ask the heuristic question then. Generalized, you would look ahead k merges further and apply the above heuristic, backtrack and decide the best option. This gets very verbose by now, so to exemplify, I will only perform one step of this new heuristic with lookahead 1:

Click to copy

          mergables
0: 3 2 3  (1,3)->(2,3)->0
1: 0 1 0         (2,4)->1*
2: 1 2 0  (1,5)->(2,4)->0
3: 0 1 2  (2,4)->(1,3)->0
4: 1 2 1         (1,4)->0
5: 0 2 0  (2,5)->(1,3)->1*
                 (2,4)->1*

Merge sequences marked with an asterisk are the best options when this new heuristic is applied.

In case a verbal explanation is necessary:

Instead of checking how many merges are possible after each possible merge for the starting set; this time we check how many merges are possible after each possible merge for each resulting set after each possible merge for the starting set. And this is for lookahead 1. For lookahead n, you'd be seeing a very long sentence repeating the part after each possible merge for each resulting set n times.

Level 3: Let's cut the greed

If you look closely, the previous approach has a disastrous perfomance for even moderate inputs and lookaheads(*). For inputs beyond 20 triplets anything beyond 4-merge-lookahead takes unreasonably long. The idea here is to cut out merge paths that seem to be worse than an existing solution. If we want to perform lookahead 10, and a specific merge path yields less mergables after three merges, than another path after 5 merges, we may just as well cut the current merge path and try another one. This should save a lot of time and allow large lookaheads which would get us closer to the globally optimal solution, hopefully. I haven't implemented this one for testing though.

_{(*): Assuming a large reduction of input sets is possible, the number of merges is
proportional to input size, and
lookahead approximately indicates how much you permute those merges.
So you have choose lookahead from |input|, which is
the binomial coefficient that for lookahead ≪ |input| can be approximated as
O(|input|^lookahead) -- which is also (rightfully) written as you are thoroughly screwed.}

Putting it all together

I was intrigued enough by this problem that I sat and coded this down in Python. Sadly, I was able to prove that different lookaheads yield possibly different results, and that even the blind algorithm occasionally gets it better than lookahead 1 or 2. This is a direct proof that the solution is not optimal (at least for lookahead ≪ |input|). See the source code and helper scripts, as well as proof-triplets on github. Be warned that, apart from memoization of merge results, I made no attempt at optimizing the code CPU-cycle-wise.

answered Oct 05 '22 23:10

Irfy

Related questions
                            
                                Why Swagger created a systemId field in example?
                            
                                Does the specification guarantee that operations on sequential Java streams have to stay in the current thread?
                            
                                Is it possible to get a deadlock with ConcurrentHashMap in this circumstance?
                            
                                JVM debug connector internals and security
                            
                                Spring Security oauth2 client - problem with Twitter
                            
                                Preventing java agents from attaching at runtime
                            
                                Self-type - requiring a type within a type bound
                            
                                Hibernate Envers Revert data to a version
                            
                                Hibernate not caching my OneToOne relationship on the inverse side
                            
                                Rotating video taken in portrait mode
                            
                                SOAP response wrongly deserializing as null in CXF + Simple frontend + Aegis databinding
                            
                                Performance Overhead of AOP
                            
                                Write Tar File of Unknown (Large) Size in Java
                            
                                How do I get Google Maps "Places API" to return ZIP code information?
                            
                                Reserved Keywords -- Hidden? JNLP
                            
                                Override class in java
                            
                                Javadoc inheriting parent constructors documentation
                            
                                How to write(enable) Java code in HTML while using GWT 2.5?
                            
                                Eclipse: JRE System Library in Java Build Path reset
                            
                                LibUsb claim interface access denied Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimal merging of triplets

Tags:

java

algorithm

optimization

Danstahr

People also ask

1 Answers

Level 0: Not really greedy

Level 1: Greedy

Level 2: Very greedy

Level 3: Let's cut the greed

Putting it all together

Irfy

Recent Activity

Donate For Us