Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two sorted int arrays

I have millions of fixed-size (100) int arrays. Each array is sorted and has unique elements. For each array, I want to find all arrays which have 70% common elements. Right now I am getting around 1 million comparisons (using Arrays.binarySearch()) per second, which is too slow for us.

Can anyone recommend a better searching algorithm?

like image 519
ashish Avatar asked Jan 21 '11 11:01

ashish


2 Answers

Something like this should do the job (provided that the arrays are sorted and contain unique elements):

public static boolean atLeastNMatchingElements(final int n,
    final int[] arr1,
    final int[] arr2){

    /* check assumptions */
    assert (arr1.length == arr2.length);

    final int arrLength = arr2.length;

    { /* optimization */
        final int maxOffset = Math.max(arrLength - n, 0);
        if(arr1[maxOffset] < arr2[0] || arr2[maxOffset] < arr1[0]){
            return false;
        }
    }

    int arr2Offset = 0;
    int matches = 0;

    /* declare variables only once, outside loop */
    int compResult; int candidate;

    for(int i = 0; i < arrLength; i++){
        candidate = arr1[i];
        while(arr2Offset < arrLength - 1){
            compResult = arr2[arr2Offset] - candidate;
            if(compResult > 0){
                break;
            } else{
                arr2Offset++;
                if(compResult == 0){
                    matches++;
                    break;
                }
            }
        }
        if(matches == n){
            return true;
        }
        /* optimization */
        else if(matches < n - (arrLength - arr2Offset)){
            return false;
        }
    }
    return false;
}

Sample usage:

public static void main(final String[] args){
    final int[] arr1 = new int[100];
    final int[] arr2 = new int[100];
    int x = 0, y = 0;
    for(int i = 0; i < 100; i++){
        if(i % 10 == 0){
            x++;
        }
        if(i % 12 == 0){
            y++;
        }
        arr1[i] = x;
        arr2[i] = y;
        x++;
        y++;
    }
    System.out.println(atLeastNMatchingElements(70, arr1, arr2));
    System.out.println(atLeastNMatchingElements(95, arr1, arr2));
}

Output:

true
false

Premature Optimizations™

I have now tried to optimize the above code. Please check whether the code blocks marked as

/* optimization */

make a noticeable difference. After optimization, I would refactor the code to get it down to one or two return statements.

like image 160
Sean Patrick Floyd Avatar answered Sep 19 '22 12:09

Sean Patrick Floyd


There are two quick optimisations you can make.

If array A's start element is greater than B's end element, they trivially can't have common elements.

They other one is a triangle inequality-like thing:

f(B,C) <= 100 - |f(A,B)-f(A,C)|

The reason for this is that (assuming f(A,B) > f(A,C)) there are at least f(A,B) - f(A,C) elements that are in both A and B but not in C. Which means that they can't be common elements of B and C.

like image 35
biziclop Avatar answered Sep 19 '22 12:09

biziclop