My company's cat-herding application tracks a convoy of cats. Periodically, it needs to compare <code>previousOrder</code> to <code>currentOrder</code> (each is an <code>ArrayList<Cat></code>) and notify the cat-wranglers of any changes. Each cat is unique and can only appear once in each list (or not at all). Most of the time, the <code>previousOrder</code> and <code>currentOrder</code> lists have the same contents, in the same order, but any of the following can happen (from more frequent to less frequent): <ol> <li>The order of cats is scrambled completely</li> <li>Cats individually move up or down in the list</li> <li>New cats join, at a specific point in the convoy</li> <li>Cats leave the convoy</li> </ol> This appears like an edit distance problem to me. Ideally, I am looking for an algorithm that determines the steps required to make <code>previousOrder</code> match <code>currentOrder</code>: <ul> <li>MOVE <code>Fluffy</code> to position <code>12</code> </li> <li>INSERT <code>Snuggles</code> at position <code>37</code> </li> <li>DELETE <code>Mr. Chubbs</code> </li> <li>etc.</li> </ul> The algorithm should also recognize scenario #1, in which case the new order is communicated in its entirety. What's the best approach for this? (This post and that post pose similar questions, but they are both dealing with sorted lists. Mine are ordered, but unsorted.) EDIT The Levenshtein algorithm is a great suggestion, but I'm concerned about the time/space requirement of creating a matrix. My main goal is to determine and communicate the changes as quickly as possible. Something that is faster than finding the additions and sending message along the lines of "Here are the new cats, and here is the current order."

Here's an algorithm I put together to merge two lists, <code>old</code> and <code>new</code>. It's not the most elegant or efficient, but it seems to work okay for the data I'm using it for. <code>new</code> is the most updated list of data, and <code>old</code> is the out-of-date list that needs to get transformed into <code>new</code>. The algorithm performs its operations on the <code>old</code> list - removing, moving, and inserting items accordingly. <pre class="prettyprint"><code>for(item in old) if (new does not contain item) remove item from old for(item in new) if (item exists in old) if (position(item, old) == position(item, new)) continue // next loop iteration else move old item to position(item, new) else insert new item into old at position(item, new) </code></pre> The deletions are all done up front to make the positions of the items more predictable in the second loop. The driving force behind this was to sync a list of data from the server with <code><table></code> rows in a browser DOM (using javascript). It was needed because we didn't want to redraw the entire table whenever the data changed; the differences between the lists were likely to be small and only affect one or two rows. It may not be the algorithm you're looking for for your data. If not, let me know and I'll delete this. There are probably some optimizations that could be made for this. But it is performant and predictable enough for me and the data I'm working with.

Levenshtein distance metric. http://www.levenshtein.net/

An efficient way to solve this is by using dynamic programming. Wikipedia has pseudo-code for a closely related problem: Computing Levenshtein distance. Keeping track of the actual operations and incorporating the "scramble" operation shouldn't be too difficult.

I recently had to do this, except items could exist multiple times. This complicated things, but I was able to do it using look-ahead counters and some other craziness. It looks a lot like Rob's solution, so thanks to him for getting me started! First off, let's assume that we want to return the list of operations that will transform the first list into the second: <pre class="prettyprint"><code>public interface Operation { /** * Apply the operation to the given list. */ void apply(List<String> keys); } </code></pre> and we have some helper methods to construct operations. You actually don't need a "move" operation, and you could even have a "swap" as well (or instead), but this is what I went with: <pre class="prettyprint"><code>Operation delete(int index) { ... } Operation insert(int index, String key) { ... } Operation move(int from, int to) { ... } </code></pre> Now we'll define a special class to hold our look-ahead counts: <pre class="prettyprint"><code>class Counter { private Map<String, Integer> counts; Counter(List<String> keys) { counts = new HashMap<>(); for (String key : keys) { if (counts.containsKey(key)) { counts.put(key, counts.get(key) + 1); } else { counts.put(key, 1); } } } public int get(String key) { if (!counts.containsKey(key)) { return 0; } return counts.get(key); } public void dec(String key) { counts.put(key, counts.get(key) - 1); } } </code></pre> And a helper method to get the index of the next key in the list: <pre class="prettyprint"><code>int next(List<String> list, int start, String key) { for (int i = start; i < list.size(); i++) { if (list.get(i).equals(key)) { return i; } } throw new RuntimeException("next index not found for " + key); } </code></pre> Now we're ready to do the transform: <pre class="prettyprint"><code>List<Operation> transform(List<String> from, List<String> to) { List<Operation> operations = new ArrayList<>(); // make our own copy of the first, that we can mutate from = new ArrayList<>(from); // maintain lookahead counts Counter fromCounts = new Counter(from); Counter toCounts = new Counter(to); // do all our deletes first for (int i = 0; i < from.size(); i++) { String current = from.get(i); if (fromCounts.get(current) > toCounts.get(current)) { Operation op = delete(i); operations.add(op); op.apply(from); fromCounts.dec(current); i--; } } // then one more iteration for the inserts and moves for (int i = 0; i < to.size(); i++) { String current = to.get(i); if (from.size() > i && from.get(i).equals(current)) { fromCounts.dec(current); continue; } if (fromCounts.get(current) > 0) { Operation op = move(next(from, i + 1, current), i); operations.add(op); op.apply(from); fromCounts.dec(current); } else { Operation op = insert(i, current); operations.add(op); op.apply(from); } } return operations; } </code></pre> It's a bit tricky to get your head around, but basically you do the deletes so that you know for every key you are either inserting or moving. Then you run through the list again and if there's enough, you move one from the part of the list your haven't seen yet, otherwise insert. By the time you get to the end, it all lines up.

Java: Difference between two lists

Tags:

java

algorithm

list

edit-distance

My company's cat-herding application tracks a convoy of cats. Periodically, it needs to compare previousOrder to currentOrder (each is an ArrayList<Cat>) and notify the cat-wranglers of any changes.

Each cat is unique and can only appear once in each list (or not at all). Most of the time, the previousOrder and currentOrder lists have the same contents, in the same order, but any of the following can happen (from more frequent to less frequent):

The order of cats is scrambled completely
Cats individually move up or down in the list
New cats join, at a specific point in the convoy
Cats leave the convoy

This appears like an edit distance problem to me. Ideally, I am looking for an algorithm that determines the steps required to make previousOrder match currentOrder:

MOVE Fluffy to position 12
INSERT Snuggles at position 37
DELETE Mr. Chubbs
etc.

The algorithm should also recognize scenario #1, in which case the new order is communicated in its entirety.

What's the best approach for this?

(This post and that post pose similar questions, but they are both dealing with sorted lists. Mine are ordered, but unsorted.)

EDIT

The Levenshtein algorithm is a great suggestion, but I'm concerned about the time/space requirement of creating a matrix. My main goal is to determine and communicate the changes as quickly as possible. Something that is faster than finding the additions and sending message along the lines of "Here are the new cats, and here is the current order."

643

asked Jun 01 '11 13:06

Tony the Pony

5 Answers

Here's an algorithm I put together to merge two lists, old and new. It's not the most elegant or efficient, but it seems to work okay for the data I'm using it for.

new is the most updated list of data, and old is the out-of-date list that needs to get transformed into new. The algorithm performs its operations on the old list - removing, moving, and inserting items accordingly.

for(item in old)
    if (new does not contain item)
        remove item from old

for(item in new)
    if (item exists in old)
        if (position(item, old) == position(item, new))
            continue // next loop iteration
        else
            move old item to position(item, new)
    else
        insert new item into old at position(item, new)

The deletions are all done up front to make the positions of the items more predictable in the second loop.

The driving force behind this was to sync a list of data from the server with <table> rows in a browser DOM (using javascript). It was needed because we didn't want to redraw the entire table whenever the data changed; the differences between the lists were likely to be small and only affect one or two rows. It may not be the algorithm you're looking for for your data. If not, let me know and I'll delete this.

There are probably some optimizations that could be made for this. But it is performant and predictable enough for me and the data I'm working with.

194

answered Sep 20 '22 02:09

Rob Hruska

Levenshtein distance metric.

http://www.levenshtein.net/

answered Sep 23 '22 02:09

bmargulies

An efficient way to solve this is by using dynamic programming. Wikipedia has pseudo-code for a closely related problem: Computing Levenshtein distance.

Keeping track of the actual operations and incorporating the "scramble" operation shouldn't be too difficult.

answered Sep 22 '22 02:09

NPE

I know the questioner was seeking a Java solution, but I came across this question whilst seeking an algorithm to implement in C#.

Here's my solution, which generates an enumeration of simple IListDifference values: either ItemAddedDifference, ItemRemovedDifference or ItemMovedDifference.

It uses a working copy of the source list to establish, item by item, what modifications are necessary to transform it to match the target list.

public class ListComparer<T>
    {
        public IEnumerable<IListDifference> Compare(IEnumerable<T> source, IEnumerable<T> target)
        {
            var copy = new List<T>(source);

            for (var i = 0; i < target.Count(); i++)
            {
                var currentItemsMatch = false;

                while (!currentItemsMatch)
                {
                    if (i < copy.Count && copy[i].Equals(target.ElementAt(i)))
                    {
                        currentItemsMatch = true;
                    }
                    else if (i == copy.Count())
                    {
                        // the target item's index is at the end of the source list
                        copy.Add(target.ElementAt(i));
                        yield return new ItemAddedDifference { Index = i };
                    }
                    else if (!target.Skip(i).Contains(copy[i]))
                    {
                        // the source item cannot be found in the remainder of the target, therefore
                        // the item in the source has been removed 
                        copy.RemoveAt(i);
                        yield return new ItemRemovedDifference { Index = i };
                    }
                    else if (!copy.Skip(i).Contains(target.ElementAt(i)))
                    {
                        // the target item cannot be found in the remainder of the source, therefore
                        // the item in the source has been displaced by a new item
                        copy.Insert(i, target.ElementAt(i));
                        yield return new ItemAddedDifference { Index = i };
                    }
                    else
                    {
                        // the item in the source has been displaced by an existing item
                        var sourceIndex = i + copy.Skip(i).IndexOf(target.ElementAt(i));
                        copy.Insert(i, copy.ElementAt(sourceIndex));
                        copy.RemoveAt(sourceIndex + 1);
                        yield return new ItemMovedDifference { FromIndex = sourceIndex, ToIndex = i };
                    }
                }
            }

            // Remove anything remaining in the source list
            for (var i = target.Count(); i < copy.Count; i++)
            {
                copy.RemoveAt(i);
                yield return new ItemRemovedDifference { Index = i };
            }
        }
    }

Just noticed this makes use of a custom extension method on IEnumerable - 'IndexOf':

public static class EnumerableExtensions
{
    public static int IndexOf<T>(this IEnumerable<T> list, T item)
    {
        for (var i = 0; i < list.Count(); i++)
        {
            if (list.ElementAt(i).Equals(item))
            {
                return i;
            }
        }

        return -1;
    }
}

answered Sep 21 '22 02:09

sandy

I recently had to do this, except items could exist multiple times. This complicated things, but I was able to do it using look-ahead counters and some other craziness. It looks a lot like Rob's solution, so thanks to him for getting me started!

First off, let's assume that we want to return the list of operations that will transform the first list into the second:

public interface Operation {
    /**
     * Apply the operation to the given list.
     */
    void apply(List<String> keys);
}

and we have some helper methods to construct operations. You actually don't need a "move" operation, and you could even have a "swap" as well (or instead), but this is what I went with:

Operation delete(int index) { ... }
Operation insert(int index, String key) { ... }
Operation move(int from, int to) { ... }

Now we'll define a special class to hold our look-ahead counts:

class Counter {
    private Map<String, Integer> counts;

    Counter(List<String> keys) {
        counts = new HashMap<>();

        for (String key : keys) {
            if (counts.containsKey(key)) {
                counts.put(key, counts.get(key) + 1);
            } else {
                counts.put(key, 1);
            }
        }
    }

    public int get(String key) {
        if (!counts.containsKey(key)) {
            return 0;
        }

        return counts.get(key);
    }

    public void dec(String key) {
        counts.put(key, counts.get(key) - 1);
    }
}

And a helper method to get the index of the next key in the list:

int next(List<String> list, int start, String key) {
    for (int i = start; i < list.size(); i++) {
        if (list.get(i).equals(key)) {
            return i;
        }
    }

    throw new RuntimeException("next index not found for " + key);
}

Now we're ready to do the transform:

List<Operation> transform(List<String> from, List<String> to) {
    List<Operation> operations = new ArrayList<>();

    // make our own copy of the first, that we can mutate
    from = new ArrayList<>(from);

    // maintain lookahead counts
    Counter fromCounts = new Counter(from);
    Counter toCounts = new Counter(to);

    // do all our deletes first
    for (int i = 0; i < from.size(); i++) {
        String current = from.get(i);

        if (fromCounts.get(current) > toCounts.get(current)) {
            Operation op = delete(i);
            operations.add(op);
            op.apply(from);
            fromCounts.dec(current);
            i--;
        }
    }

    // then one more iteration for the inserts and moves
    for (int i = 0; i < to.size(); i++) {
        String current = to.get(i);

        if (from.size() > i && from.get(i).equals(current)) {
            fromCounts.dec(current);
            continue;
        }

        if (fromCounts.get(current) > 0) {
            Operation op = move(next(from, i + 1, current), i);
            operations.add(op);
            op.apply(from);

            fromCounts.dec(current);
        } else {
            Operation op = insert(i, current);
            operations.add(op);
            op.apply(from);
        }
    }

    return operations;
}

It's a bit tricky to get your head around, but basically you do the deletes so that you know for every key you are either inserting or moving. Then you run through the list again and if there's enough, you move one from the part of the list your haven't seen yet, otherwise insert. By the time you get to the end, it all lines up.

answered Sep 24 '22 02:09

Phil Kulak

Related questions
                            
                                Interfaces and Abstract classes confusion in Java with examples
                            
                                Canvas - good rendering practices?
                            
                                Is it safe to use a Spring @Transactional annotation on a Java 8 default interface method?
                            
                                java.nio.file: Where is the Path interface actually implemented?
                            
                                Abstract class as parcelable
                            
                                What is the difference between BigDecimal movePointRight and scaleByPowerOfTen?
                            
                                Why does ArrayList use Object[] (instead of E[]) internally? [duplicate]
                            
                                How to get Spring Boot and OAuth2 example to use password grant credentials other than the default
                            
                                Applying AppCompat theme to individual preferences in a PreferenceFragment
                            
                                How to replace 'if statement' with rx-java for avoiding callback hell?
                            
                                Secure / Encrypt log4j files
                            
                                What does the JVM do when 'new' operator initializes the memory using the constructor?
                            
                                Hibernate interceptor and event listeners
                            
                                How to handle kafka publishing failure in robust way
                            
                                minifyEnabled causing app to crash on release mode
                            
                                High number of live/dead tuples in postgresql/ Vacuum not working
                            
                                Looking for a simple Java API for creating graphs (edges + nodes) [closed]
                            
                                Is Java's TimeZone thread-safe?
                            
                                jsse handshake_failure on public https web site
                            
                                Jstack and Jstat stopped working with upgrade to JDK6u23

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With