Comparing two sorted int arrays

Question

I have millions of fixed-size (100) int arrays. Each array is sorted and has unique elements. For each array, I want to find all arrays which have 70% common elements. Right now I am getting around 1 million comparisons (using Arrays.binarySearch()) per second, which is too slow for us.

Can anyone recommend a better searching algorithm?

Sean Patrick Floyd · Accepted Answer

Something like this should do the job (provided that the arrays are sorted and contain unique elements):

public static boolean atLeastNMatchingElements(final int n,
    final int[] arr1,
    final int[] arr2){

    /* check assumptions */
    assert (arr1.length == arr2.length);

    final int arrLength = arr2.length;

    { /* optimization */
        final int maxOffset = Math.max(arrLength - n, 0);
        if(arr1[maxOffset] < arr2[0] || arr2[maxOffset] < arr1[0]){
            return false;
        }
    }

    int arr2Offset = 0;
    int matches = 0;

    /* declare variables only once, outside loop */
    int compResult; int candidate;

    for(int i = 0; i < arrLength; i++){
        candidate = arr1[i];
        while(arr2Offset < arrLength - 1){
            compResult = arr2[arr2Offset] - candidate;
            if(compResult > 0){
                break;
            } else{
                arr2Offset++;
                if(compResult == 0){
                    matches++;
                    break;
                }
            }
        }
        if(matches == n){
            return true;
        }
        /* optimization */
        else if(matches < n - (arrLength - arr2Offset)){
            return false;
        }
    }
    return false;
}

Sample usage:

public static void main(final String[] args){
    final int[] arr1 = new int[100];
    final int[] arr2 = new int[100];
    int x = 0, y = 0;
    for(int i = 0; i < 100; i++){
        if(i % 10 == 0){
            x++;
        }
        if(i % 12 == 0){
            y++;
        }
        arr1[i] = x;
        arr2[i] = y;
        x++;
        y++;
    }
    System.out.println(atLeastNMatchingElements(70, arr1, arr2));
    System.out.println(atLeastNMatchingElements(95, arr1, arr2));
}

Output:

true
false

Premature Optimizations™

I have now tried to optimize the above code. Please check whether the code blocks marked as

/* optimization */

make a noticeable difference. After optimization, I would refactor the code to get it down to one or two return statements.

biziclop · Answer

There are two quick optimisations you can make.

If array A's start element is greater than B's end element, they trivially can't have common elements.

They other one is a triangle inequality-like thing:

f(B,C) <= 100 - |f(A,B)-f(A,C)|

The reason for this is that (assuming f(A,B) > f(A,C)) there are at least f(A,B) - f(A,C) elements that are in both A and B but not in C. Which means that they can't be common elements of B and C.

Comparing two sorted int arrays

Tags:

java

arrays

comparison

int

ashish

2 Answers

Premature Optimizations™

Sean Patrick Floyd

biziclop

Recent Activity

Donate For Us

Comparing two sorted int arrays

Tags:

java

arrays

comparison

int

ashish

2 Answers

Premature Optimizations™

Sean Patrick Floyd

biziclop

Related questions

Recent Activity

Donate For Us