Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two Collections populated with different kinds of objects

I have a performance issue trying to compare two big Collections and I'm looking for some help to find a better way to do that.

The classes:

public class TypeOne {
   private int id;
}

and

public class TypeTwo {
   private int id;
}

The code:

Collection<TypeOne> oneColl = someMethodToPopulateThat();
Collection<TypeTwo> twoColl = anotherMethodToPopulateThat();

// Iterating both collections to match the elements by id
for(TypeOne one : oneColl) {
   for(TypeTwo two : twoColl) {
      if (one.getId().equals(two.getId())) 
         System.out.println(one.getId());
   }
}

I already tried to use some functions of Stream API without success.

Does anyone have any idea to solve this issue?

like image 921
Alexandre Mateus Avatar asked Oct 26 '25 14:10

Alexandre Mateus


2 Answers

tl;dr

ones.stream().forEach(
    one -> System.out.println(
        twos.stream().filter( two -> two.getId() == one.getId() ).findAny().toString()
    )
)

Details

I assume sorting as a NavigableSet will improve performance of our searching, though I’ve not verified this attempt at optimization works.

NavigableSet < TypeOne > ones = new TreeSet <>( Comparator.comparingInt( TypeOne :: getId ) );
ones.addAll( collectionOfOnes ) ;

NavigableSet < TypeTwo > twos = new TreeSet <>( Comparator.comparingInt( TypeTwo :: getId ) );
twos.addAll( collectionOfTwos ) ;

Loop one navigable set while searching for a match in the other.

for( TypeOne one : ones )
{
    Optional<TypeTwo> optionalTwo = twos.stream().filter( two -> two.getId() == one.getId() ).findAny() ;
    // handle your Optional which may or may not contain an object. 
}

stream vs parallelStream

Here is a full code example.

record TypeOne( int id ) { }
record TypeTwo( int id ) { }

final int limit = 50_000;
SequencedCollection < TypeOne > sourceOfOnes = IntStream.rangeClosed ( 1 , limit ).mapToObj ( TypeOne :: new ).toList ( );
SequencedCollection < TypeTwo > sourceOfTwos = IntStream.rangeClosed ( 1 , limit ).mapToObj ( TypeTwo :: new ).toList ( );

NavigableSet < TypeOne > ones = new TreeSet <> ( Comparator.comparingInt ( TypeOne :: id ) );
ones.addAll ( sourceOfOnes );

NavigableSet < TypeTwo > twos = new TreeSet <> ( Comparator.comparingInt ( TypeTwo :: id ) );
twos.addAll ( sourceOfTwos );

long start = System.nanoTime ( );
for ( TypeOne one : ones )
{
    Optional < TypeTwo > optionalTwo = twos.parallelStream ( ).filter ( two -> two.id ( ) == one.id ( ) ).findAny ( );
    if ( optionalTwo.isEmpty ( ) )
    {
        System.out.println ( "FAILED to find " + one );
    }
}
Duration elapsed = Duration.ofNanos ( System.nanoTime ( ) - start );
System.out.println ( "elapsed = " + elapsed );

No fancy testing, I just executed this around five times on a MacBook Pro (18,1) with Apple M1 Pro chip, 10 cores (8 performance and 2 efficiency), 16 GB RAM, macOS Sonoma 14.7.2, Java 23.0.1, IntelliJ IDEA 2024.3.1 (Ultimate Edition), relatively quiet (little other CPU use). I ran two batches of tests, each batch runs the stream in one thread versus in parallel:

  • twos.stream ( )
  • twos.parallelStream ( )

Results:

stream parallelStream
≈ 7.5 seconds ≈ 4 seconds
like image 147
Basil Bourque Avatar answered Oct 29 '25 05:10

Basil Bourque


If the property by which the collection elements are being compared is int we can use bitset api for better performance

        BitSet setOne = new BitSet();
        BitSet setTwo = new BitSet();

        List<TypeOne> l1 = java.util.Arrays.asList(new TypeOne(7),new TypeOne(1), new TypeOne(88));
        List<TypeTwo> l2 = java.util.Arrays.asList(new TypeTwo(1), new TypeTwo(98),new TypeTwo(7));

        l1.stream()
                .map(t->t.id)
                .forEach(setOne::set);
        
        for (TypeTwo typeTwo : l2) {
            setTwo.set(typeTwo.id);
        }
        setOne.and(setTwo);//find the commons in setOne
        setOne.stream().forEach(i-> System.out.println(i));
        
like image 25
Sagar Avatar answered Oct 29 '25 05:10

Sagar