Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groovy: Difference with a.intersect( b ) and b.intersect( a )

Why in Groovy, when I create 2 lists, is there a difference if I do a.intersect( b ) and b.intersect( a ):

def list1 = ["hello", "world", "world"];
def list2 = ["world", "world", "world"];

println( "Intersect list1 with list2: " + list1.intersect( list2 ) );
println( "Intersect list2 with list1: " + list2.intersect( list1) );

traces:

Intersect list1 with list2: [world, world, world]
Intersect list2 with list1: [world, world]

(you can copy it here: http://groovyconsole.appspot.com/ if you want to test it)

If the arrays all contain unique elements, then it works as normal. Once you begin to add duplicates, it gets weird:

def list1 = ["hello", "world", "test", "test"];
def list2 = ["world", "world", "world", "test"];

println( "Intersect list1 with list2: " + list1.intersect( list2 ) );
println( "Intersect list2 with list1: " + list2.intersect( list1 ) );

traces:

Intersect list1 with list2: [world, world, world, test]
Intersect list2 with list1: [world, test, test]

I thought the whole point of intersect() was to give you the common elements, so it didn't matter which order you put them in?

If this isn't the case, how can I get only the common elements (expect duplicates in the array). E.g. example one should return ["world", "world"] and example two should return ["world", "test"]

Edit

To clarify a bit, this code should test that user data is still the same (assuming they disconnected in the middle of something and we want to make sure the data hasn't been tampered with, or is in the same state as before).

The order of the lists can't be guaranteed (the user could reorder it, but it's still technically the "same"), and duplicates are possible.

So something like: ["one", "one", "two"] should match ["two", "one", "one"], whereas any addition to the lists, or change in data shouldn't match.

like image 332
divillysausages Avatar asked Sep 12 '11 11:09

divillysausages


1 Answers

If you look at the source for Collection.intersect, you can see that the logic of the method follows this flow:

for two collections, left and right

  1. Swap left and right if left is smaller than right
  2. Add all of left into a Set (removes duplicates)
  3. For each element in right if it exists in the leftSet, then add it to the results

So, for your last 2 examples;

def array1 = ["hello", "world", "test", "test"]
def array2 = ["world", "world", "world", "test"]

array1.intersect( array2 ) would give (if we wrote that same algorithm in Groovy):

leftSet = new TreeSet( array1 ) // both same size, so no swap
// leftSet = [ 'hello', 'world', 'test' ]
right   = array2
result = right.findAll { e -> leftSet.contains( e ) }

Which (if you run it), you can see means result has the value [world, world, world, test] (as you found). This is because every element in right can be found in the leftSet

Not sure why the first example should return ["world","world"] though...

later...

So, what I think you are looking for would be something like this:

def array1 = ["hello", "world", "test", "test"]
def array2 = ["world", "world", "world", "test"]
def intersect1 = array1.intersect( array2 ) as TreeSet
def intersect2 = array2.intersect( array1 ) as TreeSet
assert intersect1 == intersect2

in order that you cope with the duplicates in the collections, as then both intersect1 and intersect2 will be equal to

[test, world]

later still

I believe this does what you want:

[array1,array2]*.groupBy{it}.with { a, b -> assert a == b }
like image 165
tim_yates Avatar answered Nov 17 '22 22:11

tim_yates