Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Collection performance question

I have created a method that takes two Collection<String> as input and copies one to the other.

However, I am not sure if I should check if the collections contain the same elements before I start copying, or if I should just copy regardless. This is the method:

 /**
  * Copies from one collection to the other. Does not allow empty string. 
  * Removes duplicates.
  * Clears the too Collection first
  * @param src
  * @param dest
  */
 public static void copyStringCollectionAndRemoveDuplicates(Collection<String> src, Collection<String> dest) {
  if(src == null || dest == null)
   return;

  //Is this faster to do? Or should I just comment this block out
  if(src.containsAll(dest))
   return;

  dest.clear();
  Set<String> uniqueSet = new LinkedHashSet<String>(src.size());
  for(String f : src) 
   if(!"".equals(f)) 
    uniqueSet.add(f);

  dest.addAll(uniqueSet);
 }

Maybe it is faster to just remove the

if(src.containsAll(dest))
    return;

Because this method will iterate over the entire collection anyways.

like image 383
Shervin Asgari Avatar asked May 27 '10 07:05

Shervin Asgari


People also ask

Which collection is best for performance Java?

The ArrayDeque class is often the best choice for managing a deque. However, in some cases, a LinkedList will be faster. If performance is important, experiment to see if LinkedList performs better in your application. LinkedList: The LinkedList class can be used to organize objects into a deque.

Which collection is fastest in Java?

If you need fast access to elements using index, ArrayList should be choice. If you need fast access to elements using a key, use HashMap. If you need fast add and removal of elements, use LinkedList (but it has a very poor seeking performance).

Can we use collections in Java for coding interviews?

The answer is you can be asked to write java collection programs in written round or in face to face round of interview. Make sure you are well prepared whenever the opportunity arises.


4 Answers

I'd say: Remove it! It's duplicate 'code', the Set is doing the same 'contains()' operation so there is no need to preprocess it here. Unless you have a huge input collection and a brilliant O(1) test for the containsAll() ;-)

The Set is fast enough. It has a O(n) complexity based on the size of the input (one contains() and (maybe) one add() operation for every String) and if the target.containsAll() test fails, contains() is done twice for each String -> less performant.

EDIT

Some pseudo code to visualize my answer

void copy(source, dest) {
  bool:containsAll = true;
  foreach(String s in source) {  // iteration 1
    if (not s in dest) {         // contains() test
       containsAll=false
       break
    }
  }
  if (not containsAll) {
    foreach(String s in source) { // iteration 2
      if (not s in dest) {        // contains() test
        add s to dest
      }
    }
  }
}

If all source elements are in dest, then contains() is called once for each source element. If all but the last source elements are in dest (worst case), then contains() is called (2n-1) times (n=size of source collection). But the total number of contains() test with the extra test is always equal or greater then the same code without the extra test.

EDIT 2 Lets assume, we have the following collections:

source = {"", "a", "b", "c", "c"}
dest = {"a", "b"}

First, the containsAll test fails, because the empty String in source is not in dest (this is a small design flaw in your code ;)). Then you create an temporary set which will be {"a", "b", "c"} (empty String and second "c" ignored). Finally you add everthing to dest and assuming, dest is a simple ArrayList, the result is {"a", "b", "a", "b", "c"}. Is that the intention? A shorter alternative:

void copy(Collection<String> in, Collection<String> out) {
  Set<String> unique = new HashSet<String>(in);
  in.remove("");
  out.addAll(unique);
}
like image 63
Andreas Dolk Avatar answered Nov 09 '22 23:11

Andreas Dolk


The containsAll() would not help if target has more elements than dest:
target: [a,b,c,d]
dest: [a,b,c]
target.containsAll(dest) is true, so dest is [a,b,c] but should be [a,b,c,d].

I think the following code is more elegant:

Set<String> uniqueSet = new LinkedHashSet<String>(target.size());
uniqueSet.addAll(target);
if(uniqueSet.contains(""))
    uniqueSet.remove("");

dest.addAll(uniqueSet);
like image 43
Daniel Engmann Avatar answered Nov 10 '22 01:11

Daniel Engmann


You could benchmark it, if it mattered that much. I think the call to containsAll() likely does not help, though it could depend on how often the two collections have the same contents.

But this code is confusing. It's trying to add new items to dest? So why does it clear it first? Just instead return your new uniqueSet to the caller instead of bothering. And isn't your containsAll() check reversed?

like image 27
Sean Owen Avatar answered Nov 10 '22 01:11

Sean Owen


  1. Too much confusing parameter names. dest and target have almost same meaning. You'd better choose something like dest and source. It'll make things much clearer even for you.

  2. I have a feeling (not sure that it's correct) that you use collections API in a wrong way. Interface Collection doesn't say anything about uniquness of its elements but you add this quality to it.

  3. Modifying collections which passed as parameters is not the best idea (but as usual, it depends). In general case, mutability is harmful and unnecessary. Moreover, what if passed collections are unmodifiable/immutable? It's better to return new collection then modify incoming collections.

  4. Collection interface has methods addAll, removeAll, retainAll. Did you try them first? Have you made performance tests for the code like:

    Collection<String> result = new HashSet<String> (dest);
    result.addAll (target);
    

    or

    target.removeAll (dest);
    dest.addAll (target);
    
like image 30
Roman Avatar answered Nov 10 '22 00:11

Roman