I read about how TreeSet is slower than HashSet, (that adding elements into a TreeSet is slower) so i made a performance test, i'm trying to find out if it's better to add elements to HashSet and then move them into a TreeSet or put them in there in the first place. It looks like that inserting elements into a HashSet is faster but only when i insert a large amount of elements, why? I've read that if i don't need the elements sorted, always use HashSet, but apparently, sometimes it's slower.
When I insert a concrete value("1") instead of random numbers, TreeSet is also faster because there is no sorting, so how can i know when to use HashSet or TreeSet?
And my second question, when i create TreeSet like this, why don't i have access to "NavigableSet" methods?
Set<Integer> treeSet = new TreeSet<Integer>(); //cant type treeSet.lower(e: E)
TreeSet<Integer> treeSet = new TreeSet<Integer>(); //can type treeSet.lower(e: E)
Thanks for helping me with this.
Here are the results:
5 000 000 (random numbers)
5 000 000 (numbers "1")
500 000 (random numbers)
50 000 (random numbers)
Here is my code:
package performancetest;
import java.text.DecimalFormat;
import java.util.HashSet;
import java.util.InputMismatchException;
import java.util.Scanner;
import java.util.TreeSet;
public class HashSet_vs_TreeSet {
private static DecimalFormat df = new DecimalFormat("#.#####");
private static double hashTime, treeTime, hashToTreeTime;
private static int numbers = 0;
public static void main(String[] args){
start();
hashSetTest();
treeSetTest();
fromHashToTreeSet();
difference();
}
/**
* in this method, instead of "System.exit(1)" i try'ed "start()" but when i did lets say three mistakes, after correct input the
* methods hashSetTest();treeSetTest();fromHashToTreeSet();difference(); were running 4 times... i made this method exactly for the
* purpose to repeat itself in case of a input mistake.
*/
public static void start(){
System.out.print("Enter a number(a advise bigger or equal to 50 000): ");
Scanner scan = new Scanner(System.in);
try{
numbers = scan.nextInt();
}catch(InputMismatchException e){
System.out.println("ERROR: You need to enter a number");
System.exit(1);
}
System.out.println(numbers + " numbers in range from 0-99 was randomly generated into " +
"\n1)HashSet\n2)TreeSet\n3)HashSet and then moved to TreeSet\n");
}
public static void hashSetTest(){
/**
* i try'ed HashSet<Integer> hashSet = new HashSet<Integer>();
* but changing the load factor didn't make any difference, what its good for then ?
*/
HashSet<Integer> hashSet = new HashSet<Integer>(5000,(float) 0.5);
double start = System.currentTimeMillis() * 0.001;
for (int i = 0; i < numbers; i++) {
hashSet.add((int)(Math.random() * 100));
}
hashTime = (System.currentTimeMillis() * 0.001) - start;
System.out.println("HashSet takes " + df.format(hashTime) + "s");
}
public static void treeSetTest(){
TreeSet<Integer> treeSet = new TreeSet<Integer>();
double start = System.currentTimeMillis() * 0.001;
for (int i = 0; i < numbers; i++) {
treeSet.add((int)(Math.random() * 100));
}
treeTime = (System.currentTimeMillis() * 0.001) - start;
System.out.println("TreeSet takes " + df.format(treeTime) + "s");
}
public static void fromHashToTreeSet(){
HashSet<Integer> hashSet = new HashSet<Integer>();
double start = System.currentTimeMillis() * 0.001;
for (int i = 0; i < numbers; i++) {
hashSet.add((int)(Math.random() * 100));
}
TreeSet<Integer> treeSet = new TreeSet<Integer>(hashSet);
hashToTreeTime = (System.currentTimeMillis() * 0.001) - start;
System.out.println("TreeSet from HashSet takes " + df.format(hashToTreeTime) + "s");
}
public static void difference(){
double differenceSec = 0;
double differenceTimes = 0;
if(hashTime < treeTime){
differenceSec = (treeTime - hashTime);
differenceTimes = (treeTime / hashTime);
System.out.println("\nHashSet takes " + df.format(differenceSec) + "s less then TreeSet, it is " + df.format(differenceTimes) + " times faster");
}else{
differenceSec = (hashTime - treeTime);
differenceTimes = (hashTime / treeTime);
System.out.println("\nTreeSet takes " + df.format(differenceSec) + "s less then HashSet, it is " + df.format(differenceTimes) + " times faster");
}
}
}
Well, when you talk about peformance of TreeSet and HashSet you should clearly understand how these structures are organized what consequences of its organization.
Typically TreeSet is a structure where all elements are organized in a binary tree. Thus adding a member or accessing it is ~O(log(N))
.
In other hand HashSet is a structure similar to an array. The difference is that in an array index is an unique number, while in a HashSet each key needs to be translated into index with the help of a hash function. A hash function may produce the same results for different input data, the situation is called hash collision
. A good hash function (yes, they could be bad and good) produces as many unique results on a given set of input data as possible.
So accessing data in a hash set costs calculations of a hash function (in Java usually this is .hashCode()
) and possible conflict resolution. That is its estimated as O(1) i.e. a constant number of operations.
You should understand that O(1) is not always less than O(log(n)), it's less asymptotically and on big enough n
. Also a proper choice of a hash function does matter.
0) JVM benchmarking is really complicated. Almost always you're measuring not what you're thinking you are measuring. There's http://openjdk.java.net/projects/code-tools/jmh/ for microbenchmarking from guys from Oracle. And you may try some benchmarking frameworks and guides.
JIT compiler warmup, initial memory allocation, garbage collector and a LOT of other things may invalidate your benchmark.
1) See also Hashset vs Treeset regarding your first question
2) Set<Integer> treeSet = new TreeSet<Integer>(); //cant type treeSet.lower(e: E)
That's how it works. You declare treeSet as Set. Set does not extends NavigableSet. You may explicitly cast if you want to. But if you want to access NavigableSet methods why wouldn't you declare treeSet as NavigableSet
Set<Integer> treeSet = new TreeSet<Integer>();
Integer lower = ((NavigableSet) treeSet).lower(); // thus should work
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With