I have two Multimaps which have been created from two huge CSV files.
Multimap<String, SomeClassObject> mapOne = ArrayListMultimap.create();
Multimap<String, SomeClassObject> mapTwo = ArrayListMultimap.create();
I have assumed one CSV column to be as a Key and each of the Key has thousands of values associated with it. Data contained within these Multimap
s should be same. Now I want to compare the data within these Multimap
s and find if any values are different. Here are the two approaches I am thinking of:
Approach One:
Make one big list from the Multimap
. This big list will contain a few individual lists. Each of the smaller lists contains a unique value which is the "key" read from Multimap
along with its associated values, which will form the rest of that individual list.
ArrayList<Collection<SomeClassObject>> bigList = new ArrayList<Collection<SomeClassObject>>();
Within bigList
will be individual small lists A, B, C etc.
I plan on picking individual lists from each bigList
of the two files on the basis of checking that individual list from second Multimap
contains that "key" element. If it does, then compare both of these lists and find anything that could not be matched.
Approach Two:
Compare both the Multimap
s but I am not sure how will that be done.
Which approach should have smaller execution time? I need the operation to be completed in minimum amount of time.
Multimaps.filterEntries(Multimap, Predicate)
.If you want to get the differences between two Multimap
s, it's very easy to write a filter based on containsEntry
, and then use the filtering behavior to efficiently find all the elements that don't match. Just build the Predicate
based on one map, and then filter the other.
Here's what I mean. Here, I'm using Java 8 lambdas, but you can look at the revision history of this post to see the Java 7 version:
public static void main(String[] args) {
Multimap<String, String> first = ArrayListMultimap.create();
Multimap<String, String> second = ArrayListMultimap.create();
first.put("foo", "foo");
first.put("foo", "bar");
first.put("foo", "baz");
first.put("bar", "foo");
first.put("baz", "bar");
second.put("foo", "foo");
second.put("foo", "bar");
second.put("baz", "baz");
second.put("bar", "foo");
second.put("baz", "bar");
Multimap<String, String> firstSecondDifference =
Multimaps.filterEntries(first, e -> !second.containsEntry(e.getKey(), e.getValue()));
Multimap<String, String> secondFirstDifference =
Multimaps.filterEntries(second, e -> !first.containsEntry(e.getKey(), e.getValue()));
System.out.println(firstSecondDifference);
System.out.println(secondFirstDifference);
}
Output is the element that is not in the other list, in this contrived example:
{foo=[baz]}
{baz=[baz]}
These multimaps will be empty if the maps match.
In Java 7, you can create the predicate manually, using something like this:
public static class FilterPredicate<K, V> implements Predicate<Map.Entry<K, V>> {
private final Multimap<K, V> filterAgainst;
public FilterPredicate(Multimap<K, V> filterAgainst) {
this.filterAgainst = filterAgainst;
}
@Override
public boolean apply(Entry<K, V> arg0) {
return !filterAgainst.containsEntry(arg0.getKey(), arg0.getValue());
}
}
Use it as an argument to Multimaps.filterEntries()
like this:
Multimap<String, String> firstSecondDifference =
Multimaps.filterEntries(first, new FilterPredicate(second));
Multimap<String, String> secondFirstDifference =
Multimaps.filterEntries(second, new FilterPredicate(first));
Otherwise, the code is the same (with the same result) as the Java 8 version above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With