Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to return common elements from two string arrays

In Java, what's the most efficient way to return the common elements from two String Arrays? I can do it with a pair of for loops, but that doesn't seem to be very efficient. The best I could come up with was converting to a List and then applying retainAll, based on my review of a similar SO question:

List<String> compareList = Arrays.asList(strArr1);
List<String> baseList = Arrays.asList(strArr2);
baseList.retainAll(compareList);
like image 904
JW8 Avatar asked Dec 18 '11 23:12

JW8


3 Answers

EDITED:

This is a one-liner:

compareList.retainAll(new HashSet<String>(baseList));

The retainAll impl (in AbstractCollection) iterates over this, and uses contains() on the argument. Turning the argument into a HashSet will result in fast lookups, so the loop within the retainAll will execute as quickly as possible.

Also, the name baseList hints at it being a constant, so you will get a significant performance improvement if you cache this:

static final Set<String> BASE = Collections.unmodifiableSet(new HashSet<String>(Arrays.asList("one", "two", "three", "etc")));

static void retainCommonWithBase(Collection<String> strings) {
    strings.retainAll(BASE);
}

If you want to preserve the original List, do this:

static List<String> retainCommonWithBase(List<String> strings) {
   List<String> result = new ArrayList<String>(strings);
   result.retainAll(BASE);
   return result;
}
like image 186
Bohemian Avatar answered Oct 19 '22 17:10

Bohemian


I would use HashSets (and retainAll) then, which would make the whole check O(n) (for each element in the first set lookup if it exists (contains()), which is O(1) for HashSet). Lists are faster to create though (HashSet might have to deal with collisions...).

Keep in mind that Set and List have different semantics (lists allow duplicate elements, nulls...).

like image 22
milan Avatar answered Oct 19 '22 16:10

milan


Sort both arrays.

Once sorted, you can iterate both sorted arrays exactly once, using two indexes.

This will be O(NlogN).

like image 35
Burleigh Bear Avatar answered Oct 19 '22 17:10

Burleigh Bear