I have a couple of questions about how the add function of a Java Collection handles strings. For example, in the below code snippet, I am copying a List of strings to a HashSet. What is the worst-case total time complexity in this case? Is it O(M x N) or O(N), where M is the max length of any string in the list, and N is the total number of strings in the list.
public HashSet<String> createDict(List<String> wordList) {
HashSet<String> wordDict = new HashSet<>();
for(String word : wordList) {
wordDict.add(word);
}
return wordDict;
}
Will the time complexity be exactly the same if I use the below code instead of the loop?
HashSet<String> wordDict = new HashSet<>(wordList);
Length of Strings has nothing to do with copying elements between collections. In fact, you don't copy Strings themselves but references to them. So the complexity will be O(N).
When it comes to the second question about new HashSet<>(wordList) - this call will be faster than than doing a loop. The reason for that is that in HashSet(Collection) constructor it first checks the size of that collection and starts with initialCapacity based on that. This way it doesn't have to resize the underlying HashMap that often.
For those who are curious and too lazy to search, this is HashSets constructor in question:
public HashSet(Collection<? extends E> c) {
map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
And addAll from AbstractCollection:
public boolean addAll(Collection<? extends E> c) {
boolean modified = false;
for (E e : c)
if (add(e))
modified = true;
return modified;
}
So if you were to set initialCapacity in your example code you will get the same performance, like so:
public HashSet<String> createDict(List<String> wordList) {
int initialCapacity = Math.max((int) (wordList.size()/.75f) + 1, 16);
HashSet<String> wordDict = new HashSet<>(initialCapacity );
for(String word : wordList) {
wordDict.add(word);
}
return wordDict;
}
The complexity will be O(N).
Adding an item to an HashSet is O(1) and it won't compare strings char by char, which is probably how you would get O(MxN).
Yes, creating the HashSet passing the list in the constructor will have the same complexity.
Actually you can check HashSet implementation code and it does exactly the same thing you did, except for a more optimised object creation based on your list size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With