I have created a method that takes two Collection<String>
as input and copies one to the other.
However, I am not sure if I should check if the collections contain the same elements before I start copying, or if I should just copy regardless. This is the method:
/**
* Copies from one collection to the other. Does not allow empty string.
* Removes duplicates.
* Clears the too Collection first
* @param src
* @param dest
*/
public static void copyStringCollectionAndRemoveDuplicates(Collection<String> src, Collection<String> dest) {
if(src == null || dest == null)
return;
//Is this faster to do? Or should I just comment this block out
if(src.containsAll(dest))
return;
dest.clear();
Set<String> uniqueSet = new LinkedHashSet<String>(src.size());
for(String f : src)
if(!"".equals(f))
uniqueSet.add(f);
dest.addAll(uniqueSet);
}
Maybe it is faster to just remove the
if(src.containsAll(dest))
return;
Because this method will iterate over the entire collection anyways.
The ArrayDeque class is often the best choice for managing a deque. However, in some cases, a LinkedList will be faster. If performance is important, experiment to see if LinkedList performs better in your application. LinkedList: The LinkedList class can be used to organize objects into a deque.
If you need fast access to elements using index, ArrayList should be choice. If you need fast access to elements using a key, use HashMap. If you need fast add and removal of elements, use LinkedList (but it has a very poor seeking performance).
The answer is you can be asked to write java collection programs in written round or in face to face round of interview. Make sure you are well prepared whenever the opportunity arises.
I'd say: Remove it! It's duplicate 'code', the Set is doing the same 'contains()' operation so there is no need to preprocess it here. Unless you have a huge input collection and a brilliant O(1) test for the containsAll() ;-)
The Set is fast enough. It has a O(n) complexity based on the size of the input (one contains() and (maybe) one add() operation for every String) and if the target.containsAll() test fails, contains() is done twice for each String -> less performant.
EDIT
Some pseudo code to visualize my answer
void copy(source, dest) {
bool:containsAll = true;
foreach(String s in source) { // iteration 1
if (not s in dest) { // contains() test
containsAll=false
break
}
}
if (not containsAll) {
foreach(String s in source) { // iteration 2
if (not s in dest) { // contains() test
add s to dest
}
}
}
}
If all source elements are in dest, then contains() is called once for each source element. If all but the last source elements are in dest (worst case), then contains() is called (2n-1) times (n=size of source collection). But the total number of contains() test with the extra test is always equal or greater then the same code without the extra test.
EDIT 2 Lets assume, we have the following collections:
source = {"", "a", "b", "c", "c"}
dest = {"a", "b"}
First, the containsAll test fails, because the empty String in source is not in dest (this is a small design flaw in your code ;)). Then you create an temporary set which will be {"a", "b", "c"}
(empty String and second "c" ignored). Finally you add everthing to dest and assuming, dest is a simple ArrayList, the result is {"a", "b", "a", "b", "c"}
. Is that the intention? A shorter alternative:
void copy(Collection<String> in, Collection<String> out) {
Set<String> unique = new HashSet<String>(in);
in.remove("");
out.addAll(unique);
}
The containsAll()
would not help if target
has more elements than dest
:
target: [a,b,c,d]
dest: [a,b,c]target.containsAll(dest)
is true, so dest is [a,b,c] but should be [a,b,c,d].
I think the following code is more elegant:
Set<String> uniqueSet = new LinkedHashSet<String>(target.size());
uniqueSet.addAll(target);
if(uniqueSet.contains(""))
uniqueSet.remove("");
dest.addAll(uniqueSet);
You could benchmark it, if it mattered that much. I think the call to containsAll()
likely does not help, though it could depend on how often the two collections have the same contents.
But this code is confusing. It's trying to add new items to dest
? So why does it clear it first? Just instead return your new uniqueSet
to the caller instead of bothering. And isn't your containsAll()
check reversed?
Too much confusing parameter names. dest
and target
have almost same meaning. You'd better choose something like dest
and source
. It'll make things much clearer even for you.
I have a feeling (not sure that it's correct) that you use collections API in a wrong way. Interface Collection
doesn't say anything about uniquness of its elements but you add this quality to it.
Modifying collections which passed as parameters is not the best idea (but as usual, it depends). In general case, mutability is harmful and unnecessary. Moreover, what if passed collections are unmodifiable/immutable? It's better to return new collection then modify incoming collections.
Collection
interface has methods addAll
, removeAll
, retainAll
. Did you try them first? Have you made performance tests for the code like:
Collection<String> result = new HashSet<String> (dest);
result.addAll (target);
or
target.removeAll (dest);
dest.addAll (target);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With