Is there any good reference on, or can someone tell me more about, the performance of various Java set implementations on small sets (say 1-100 elements)? The O(1) vs O(log n) story is pretty much irrelevant for these sizes, but since I need to handle millions of these small sets, performance certainly does matter. Most references I find don't mention much about this.
I would need to do the following with these sets (only a few times per set usually):
hashCode()
of the entire setI think these are the viable options to compare (assumed comparing/hashing T is almost free):
hashCode()
)Collections.sort()
needed...Which of the above is generally preferred? Or should I write my own SmallSet<T>
class?
The set interface is present in java.util package and extends the Collection interface is an unordered collection of objects in which duplicate values cannot be stored. It is an interface that implements the mathematical set.
An ArrayList is a built-in data structure that uses a dynamic array to store its elements. In order to use this data structure, you must import java. util. ArrayList at the top of your program.
Core Java bootcamp program with Hands on practice The array is faster in case of access to an element while List is faster in case of adding/deleting an element from the collection.
A Set is a generic set of values with no duplicate elements. A TreeSet is a set where the elements are sorted. A HashSet is a set where the elements are not sorted or ordered.
This is a small Set implementation as arrays:
It is so easy to adopt to your needs :)
Source: https://highlyscalable.wordpress.com/2011/12/29/ultimate-sets-and-maps-for-java-p1/
public class ArraySet {
private int[] array;
private int size = 0;
public ArraySet(int capacity) {
array = new int[capacity];
Arrays.fill(array, -1);
}
public boolean add(int key) {
int index = Arrays.binarySearch(array, 0, size, key);
if (index < 0) {
int insertIndex = -index-1;
if(size < array.length - 1) {
if(insertIndex < size) {
System.arraycopy(array, insertIndex, array, insertIndex + 1, size - insertIndex);
}
array[insertIndex] = key;
} else {
int[] newArray = new int[array.length + 1];
System.arraycopy(array, 0, newArray, 0, insertIndex);
System.arraycopy(array, insertIndex, newArray, insertIndex + 1, array.length - insertIndex);
newArray[insertIndex] = key;
array = newArray;
}
size++;
return true;
}
return false;
}
public int get(int position) {
return array[position];
}
public int size() {
return size;
}
public boolean contains(int key) {
return Arrays.binarySearch(array, key) >= 0;
}
}
If you are really looking for performance, then there's nothing short of testing for yourself that will help you here:
You'll need to setup a testcase that is similar to your actual use - test long enough that GC kicks in and you see the effects there.
And if you detect critical differences between them, rerun the tests after each and every update of the JVM as the implementations might change.
Until you have done such performance tests, I will give my standard advice: Choose the best readable option and only change that when there are obvious gains from using a less readable. The code maintainers (might be a future you) will thank you for that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With