Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is this HashSet producing sorted output?

Tags:

java

hashset

The following code produces the out put [1,2]even though hashset is not sorted.

Set set = new HashSet();
set.add(new Integer(2));
set.add(new Integer(1));
System.out.println(set);

Why is that?

like image 880
Ashwin Avatar asked Sep 06 '13 01:09

Ashwin


2 Answers

EDIT: As of Java 8 and later, the following is no longer applicable. This proves that you shouldn't rely on undocumented Java behaviours.


This behaviour is caused by several separate reasons:

  • Integers hash to themselves
  • in Java, HashMaps and HashSets are backed up by an array
  • they also modify hashes using the higher bits to modify the lower bits; if the hash is in range 0..15, it's therefore not modified
  • what bucket an object goes depends on the lower bits of the modified hash
  • when iterating over a map or a set, the inner table is scanned sequentially

So if you add several small (<16) integers to a hashmap/hashset, this is what happens:

  • integer i has hashcode i
  • since it's less than 16, it's modified hash is also i
  • it lands in the bucket no. i
  • when iterating, the buckets are visited sequentially, so if all you stored there are small integers, they will be retrieved in ascending order

Note that if the initial number of buckets is too small, the integers may land in buckets not numbered after them:

HashSet<Integer> set = new HashSet<>(4);
set.add(5); set.add(3); set.add(1);
for(int i : set) {
  System.out.print(i);
}

prints 153.

like image 71
Karol S Avatar answered Nov 15 '22 14:11

Karol S


A HashSet as per the documentation does not guarantee any concept of order, so what you're seeing could very well change in a future update of Java.

However, if you're wondering why Java's (as of now) specific implementation of HashSet produces the result you're seeing: it's because the Integer of value 1 hashes to a location in the internal entry table of a HashMap that comes before the location to which 2 hashes (note that a HashSet is really backed by a HashMap with arbitrary values). This makes sense since the hash code of an Integer object is just its value.

In fact, you can see this even if you add even more numbers (within a certain range: the size of the entry table which is 16 by default):

Set<Integer> set = new HashSet<>();
set.add(2);
set.add(1);
set.add(4);
set.add(3);
set.add(0);
System.out.println(set);
[0, 1, 2, 3, 4]

Iteration over a HashSet takes place by iterating over the internal entry table, which means items earlier in the table come first.

like image 27
arshajii Avatar answered Nov 15 '22 16:11

arshajii