Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: Difference between two lists

My company's cat-herding application tracks a convoy of cats. Periodically, it needs to compare previousOrder to currentOrder (each is an ArrayList<Cat>) and notify the cat-wranglers of any changes.

Each cat is unique and can only appear once in each list (or not at all). Most of the time, the previousOrder and currentOrder lists have the same contents, in the same order, but any of the following can happen (from more frequent to less frequent):

  1. The order of cats is scrambled completely
  2. Cats individually move up or down in the list
  3. New cats join, at a specific point in the convoy
  4. Cats leave the convoy

This appears like an edit distance problem to me. Ideally, I am looking for an algorithm that determines the steps required to make previousOrder match currentOrder:

  • MOVE Fluffy to position 12
  • INSERT Snuggles at position 37
  • DELETE Mr. Chubbs
  • etc.

The algorithm should also recognize scenario #1, in which case the new order is communicated in its entirety.

What's the best approach for this?

(This post and that post pose similar questions, but they are both dealing with sorted lists. Mine are ordered, but unsorted.)

EDIT

The Levenshtein algorithm is a great suggestion, but I'm concerned about the time/space requirement of creating a matrix. My main goal is to determine and communicate the changes as quickly as possible. Something that is faster than finding the additions and sending message along the lines of "Here are the new cats, and here is the current order."

like image 643
Tony the Pony Avatar asked Jun 01 '11 13:06

Tony the Pony


People also ask

How do I compare two lists in Java?

Java provides a method for comparing two Array List. The ArrayList. equals() is the method used for comparing two Array List. It compares the Array lists as, both Array lists should have the same size, and all corresponding pairs of elements in the two Array lists are equal.

How do you compare two lists of objects?

Java equals() method of List interface compares the specified object with the list for equality. It overrides the equals() method of Object class. This method accepts an object to be compared for equality with the list. It returns true if the specified object is equal to the list, else returns false.


5 Answers

Here's an algorithm I put together to merge two lists, old and new. It's not the most elegant or efficient, but it seems to work okay for the data I'm using it for.

new is the most updated list of data, and old is the out-of-date list that needs to get transformed into new. The algorithm performs its operations on the old list - removing, moving, and inserting items accordingly.

for(item in old)
    if (new does not contain item)
        remove item from old

for(item in new)
    if (item exists in old)
        if (position(item, old) == position(item, new))
            continue // next loop iteration
        else
            move old item to position(item, new)
    else
        insert new item into old at position(item, new)

The deletions are all done up front to make the positions of the items more predictable in the second loop.

The driving force behind this was to sync a list of data from the server with <table> rows in a browser DOM (using javascript). It was needed because we didn't want to redraw the entire table whenever the data changed; the differences between the lists were likely to be small and only affect one or two rows. It may not be the algorithm you're looking for for your data. If not, let me know and I'll delete this.

There are probably some optimizations that could be made for this. But it is performant and predictable enough for me and the data I'm working with.

like image 194
Rob Hruska Avatar answered Sep 20 '22 02:09

Rob Hruska


Levenshtein distance metric.

http://www.levenshtein.net/

like image 32
bmargulies Avatar answered Sep 23 '22 02:09

bmargulies


An efficient way to solve this is by using dynamic programming. Wikipedia has pseudo-code for a closely related problem: Computing Levenshtein distance.

Keeping track of the actual operations and incorporating the "scramble" operation shouldn't be too difficult.

like image 36
NPE Avatar answered Sep 22 '22 02:09

NPE


I know the questioner was seeking a Java solution, but I came across this question whilst seeking an algorithm to implement in C#.

Here's my solution, which generates an enumeration of simple IListDifference values: either ItemAddedDifference, ItemRemovedDifference or ItemMovedDifference.

It uses a working copy of the source list to establish, item by item, what modifications are necessary to transform it to match the target list.

public class ListComparer<T>
    {
        public IEnumerable<IListDifference> Compare(IEnumerable<T> source, IEnumerable<T> target)
        {
            var copy = new List<T>(source);

            for (var i = 0; i < target.Count(); i++)
            {
                var currentItemsMatch = false;

                while (!currentItemsMatch)
                {
                    if (i < copy.Count && copy[i].Equals(target.ElementAt(i)))
                    {
                        currentItemsMatch = true;
                    }
                    else if (i == copy.Count())
                    {
                        // the target item's index is at the end of the source list
                        copy.Add(target.ElementAt(i));
                        yield return new ItemAddedDifference { Index = i };
                    }
                    else if (!target.Skip(i).Contains(copy[i]))
                    {
                        // the source item cannot be found in the remainder of the target, therefore
                        // the item in the source has been removed 
                        copy.RemoveAt(i);
                        yield return new ItemRemovedDifference { Index = i };
                    }
                    else if (!copy.Skip(i).Contains(target.ElementAt(i)))
                    {
                        // the target item cannot be found in the remainder of the source, therefore
                        // the item in the source has been displaced by a new item
                        copy.Insert(i, target.ElementAt(i));
                        yield return new ItemAddedDifference { Index = i };
                    }
                    else
                    {
                        // the item in the source has been displaced by an existing item
                        var sourceIndex = i + copy.Skip(i).IndexOf(target.ElementAt(i));
                        copy.Insert(i, copy.ElementAt(sourceIndex));
                        copy.RemoveAt(sourceIndex + 1);
                        yield return new ItemMovedDifference { FromIndex = sourceIndex, ToIndex = i };
                    }
                }
            }

            // Remove anything remaining in the source list
            for (var i = target.Count(); i < copy.Count; i++)
            {
                copy.RemoveAt(i);
                yield return new ItemRemovedDifference { Index = i };
            }
        }
    }

Just noticed this makes use of a custom extension method on IEnumerable - 'IndexOf':

public static class EnumerableExtensions
{
    public static int IndexOf<T>(this IEnumerable<T> list, T item)
    {
        for (var i = 0; i < list.Count(); i++)
        {
            if (list.ElementAt(i).Equals(item))
            {
                return i;
            }
        }

        return -1;
    }
}
like image 35
sandy Avatar answered Sep 21 '22 02:09

sandy


I recently had to do this, except items could exist multiple times. This complicated things, but I was able to do it using look-ahead counters and some other craziness. It looks a lot like Rob's solution, so thanks to him for getting me started!

First off, let's assume that we want to return the list of operations that will transform the first list into the second:

public interface Operation {
    /**
     * Apply the operation to the given list.
     */
    void apply(List<String> keys);
}

and we have some helper methods to construct operations. You actually don't need a "move" operation, and you could even have a "swap" as well (or instead), but this is what I went with:

Operation delete(int index) { ... }
Operation insert(int index, String key) { ... }
Operation move(int from, int to) { ... }

Now we'll define a special class to hold our look-ahead counts:

class Counter {
    private Map<String, Integer> counts;

    Counter(List<String> keys) {
        counts = new HashMap<>();

        for (String key : keys) {
            if (counts.containsKey(key)) {
                counts.put(key, counts.get(key) + 1);
            } else {
                counts.put(key, 1);
            }
        }
    }

    public int get(String key) {
        if (!counts.containsKey(key)) {
            return 0;
        }

        return counts.get(key);
    }

    public void dec(String key) {
        counts.put(key, counts.get(key) - 1);
    }
}

And a helper method to get the index of the next key in the list:

int next(List<String> list, int start, String key) {
    for (int i = start; i < list.size(); i++) {
        if (list.get(i).equals(key)) {
            return i;
        }
    }

    throw new RuntimeException("next index not found for " + key);
}

Now we're ready to do the transform:

List<Operation> transform(List<String> from, List<String> to) {
    List<Operation> operations = new ArrayList<>();

    // make our own copy of the first, that we can mutate
    from = new ArrayList<>(from);

    // maintain lookahead counts
    Counter fromCounts = new Counter(from);
    Counter toCounts = new Counter(to);

    // do all our deletes first
    for (int i = 0; i < from.size(); i++) {
        String current = from.get(i);

        if (fromCounts.get(current) > toCounts.get(current)) {
            Operation op = delete(i);
            operations.add(op);
            op.apply(from);
            fromCounts.dec(current);
            i--;
        }
    }

    // then one more iteration for the inserts and moves
    for (int i = 0; i < to.size(); i++) {
        String current = to.get(i);

        if (from.size() > i && from.get(i).equals(current)) {
            fromCounts.dec(current);
            continue;
        }

        if (fromCounts.get(current) > 0) {
            Operation op = move(next(from, i + 1, current), i);
            operations.add(op);
            op.apply(from);

            fromCounts.dec(current);
        } else {
            Operation op = insert(i, current);
            operations.add(op);
            op.apply(from);
        }
    }

    return operations;
}

It's a bit tricky to get your head around, but basically you do the deletes so that you know for every key you are either inserting or moving. Then you run through the list again and if there's enough, you move one from the part of the list your haven't seen yet, otherwise insert. By the time you get to the end, it all lines up.

like image 1
Phil Kulak Avatar answered Sep 24 '22 02:09

Phil Kulak