Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicate elements in java.util.Set

Tags:

java

set

java.util.Set implementations removes the duplicate elements.

How are duplicates elements deleted internally in a java.util.Set?

like image 417
Neo Avatar asked Oct 29 '09 08:10

Neo


3 Answers

Actually AFAIK from the sources most Set implementations in java don't even check if the element is already contained.

They just always execute the add() on their internal structure which holds the set elements and let that object handle the duplication case.

e.g. HashSet calls put(K,V) on the internal HashMap which just inserts the new object overwriting the old entry if duplicate.

like image 81
jitter Avatar answered Oct 20 '22 00:10

jitter


Reading a little into your question I'm guessing that you're seeing strange behaviour with a java.util.HashSet (typically what everyone uses by default).

Contary to the contract of java.util.Set it is possible to get the same object in a java.util.HashSet twice like this:

import java.util.HashSet;
import java.util.Set;

public class SetTest 
{
  public static void main(String[] args) 
  {
    MyClass myObject = new MyClass(1, "testing 1 2 3");

    Set<MyClass> set = new HashSet<MyClass>();
    set.add(myObject);

    myObject.setHashCode(2);
    set.add(myObject);

    System.out.println(set.size());  // this will print 2.
  }

  private static class MyClass 
  {
    private int hashCode;
    private String otherField;

    public MyClass(int hashCode, String otherField) 
    {    
      this.hashCode = hashCode;
      this.otherField = otherField;
    }

    public void setHashCode(int hashCode) 
    {
      this.hashCode = hashCode;
    }

    public boolean equals(Object obj) 
    {    
      return obj != null && obj.getClass().equals(getClass()) && ((MyClass)obj).otherField.equals(otherField);
    }

    public int hashCode() 
    {
      return hashCode;
    }
  }
}

After the pointer from @jitter and a look at the source you can see why this would happen.

Like @jitter says, the java.util.HashSet uses a java.util.HashMap internally. When the hash changes between the first and second add a different bucket is used in the java.util.HashMap and the object is in the set twice.

The code sample may look a little contrieved but I've seen this happen in the wild with domain classes where the hash is created from mutable fields and the equals method hasn't been kept in sync with those fields.

like image 32
Nick Holt Avatar answered Oct 19 '22 23:10

Nick Holt


An easy way to find this out is to look in the source for the code you are interested in.

Each JDK has a src.zip included which contains the source code for the public classes so you can just locate the source for HashSet and have a look :) I often use Eclipse for this. Start it, create a new Java project, set the JVM to be an installed JDK (if not you are using the system default JRE which doesn't have src.zip), and Ctrl-Shift-T to go to HashSet.

like image 35
Thorbjørn Ravn Andersen Avatar answered Oct 20 '22 00:10

Thorbjørn Ravn Andersen