Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to effectively remove updated HashSet items

Tags:

java

hashset

Given the following snippet, how do we effectively remove elements on which have been previously updated/changed?

public static class Foo {
    @Override
    public int hashCode() {
        return new Random().nextInt();
    }
}

public static void main(String[] args) {
    Set<Foo> set = new HashSet<>();

    set.add(new Foo());
    set.removeIf(f -> true); // Returns true, but no deletion occurs

    assert set.size() == 0; // Fails as set still contains it's single item
}

Note: The above snippet is intended to simulate a different Foo upon next call to Object::hashCode (on Set::remove and Set::removeIf).

EDIT:

For those who did not understand the "random hash" part, here is a different view of the problem stated above:

public static class Bar {

    public String firstName;
    public String lastName;

    public Bar() {
        this(null, null);
    }

    public Bar(String firstName, String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }

    @Override
    public int hashCode() {
        int result = 1;

        result *= 59 + (firstName == null ? 43 : firstName.hashCode());
        result *= 59 + (lastName == null ? 43 : lastName.hashCode());

        return result;
    }

}

public static void main(String[] args) {
    Set<Bar> set = new HashSet<>();

    String originalFirstName = "FOO";
    String updatedFirstName = "FOO_CHANGED";

    // Create bar
    Bar bar = new Bar();
    bar.firstName = originalFirstName;
    bar.lastName = "BAR";

    // Add bar
    set.add(bar);

    // Change bar
    System.out.println("Bar hash (now): " + bar.hashCode());
    bar.firstName = updatedFirstName;
    System.out.println("Bar hash (new): " + bar.hashCode());

    Bar oldBar = new Bar(originalFirstName, bar.lastName);
    Bar changedBar = new Bar(bar.firstName, bar.lastName);

    System.out.println("Old bar hash:     " + oldBar.hashCode()); // Hash matches old value
    System.out.println("Changed bar hash: " + changedBar.hashCode()); // Hash matches new value

    set.remove(oldBar); // Removes no elements (returns false)
    set.remove(changedBar); // Removes no elements (returns false)
    set.removeIf(f -> true); // Removes no elements (returns true)

    Iterator<Bar> iterator = set.iterator();

    while (iterator.hasNext()) {
        iterator.next();
        iterator.remove(); // Fails silently
    }

    assert set.size() == 0;
}

There's no random hash at all.

There are different hashes indeed, but apparently the elements can never be removed if they have ever been changed (therefore, their hash), regardless what. We can confirm that on both Set::remove calls, where set.remove(oldBar) should have removed the element as oldBar hash equals to the hash when bar was added.

like image 469
Henri Avatar asked Jan 21 '21 19:01

Henri


People also ask

How do I remove all elements from a HashSet?

HashSet. clear() method is used to remove all the elements from a HashSet. Using the clear() method only clears all the element from the set and not deletes the set. In other words, we can say that the clear() method is used to only empty an existing HashSet.

How HashSet detect duplicates?

If the hashcode of two objects are equal then hashset uses equal() to see if the hashcode matched objects are really equal. And if they are equal the hashset knows that the new object is duplicate of something exist in the HashSet. And the add does not happen. The add() of hashcode returns false.

Is HashSet remove O 1?

HashSet 's remove() takes O(1) expected time to locate the element to be removed - the hashCode() takes you to the bin that contains the element, and each bin is expected to have a small number of entries, so finding the element in the bin and removing it should take constant time.

How to remove object from HashSet in Java?

We can use remove () method of HashSet or by use of iterator. Let’s discuss both ways in java remove from HashSet The remove (Object o) method is used to remove the given object o from the HashSet. It returns boolean type value either true or false. If the given object exists in HashSet then it will remove the object and return true.

What is the difference between list and HashSet in Java?

As far as performance is concerned, it is better in comparison to the list. HashSet .Clear Method is used to remove all the elements from a HashSet object. Here, mySet is the name of the HashSet. Below given are some examples to understand the implementation in a better way:

How to remove an object from HashMap?

So when we remove the object from HashSet by use of remove (Object o) method, it internally removes the object from HashMap.It checks whether the object if present or not? If the object exists in HashSet it will remove it and return true otherwise false.

What happens if HashSet<T> does not contain the specified element?

If the HashSet<T> object does not contain the specified element, the object remains unchanged. No exception is thrown. This method is an O (1) operation.


3 Answers

As all other answers and comments, firstly, I should say that hashCode should remain consistent: it supposed remain the same when an element is stored in a hash based collection.


Amusingly, the code snippet in OpenJDK 11 will return 0 when the set's size is queried, but on OpenJDK 8 it will remain 1.

This happened due to changes in the standard library (see JDK-8170733):

  • The removeIf in HashMap#keySet (HashSet uses a HashMap underneath) is not overridden, so it relies on Iterator#remove.
  • Implementation of the latter method has been changed to avoid recomputing the hashCode inside the HashMap.HashIterator::remove method.
  • So the removeIf will successfully remove elements
  • See this commit:
    -            K key = p.key;
    -            removeNode(hash(key), key, null, false, false);
    +            removeNode(p.hash, p.key, null, false, false);
    

Once again: do not rely rely on this implementation detail

like image 144
Denis Zavedeev Avatar answered Sep 20 '22 18:09

Denis Zavedeev


The problem here is that if you modify the object in such a way that the hashcode is different then it is no longer structurally the same object. Another way to say this, original.equals(modified) is false (or at least should be due to the contracts of equals() and hashCode(). One solution is to modify hashCode() to calculate based on some invariant. In other words, the returned hashcode is only based on the identifying data in a Foo object that will never change no matter what. For example, this could be an id, such as for an object that maps to an underlying database table.

Alternatively, you could find a different data structure that matches your use case better. For example an ArrayList might be more appropriate since you can remove items at a given index regardles of the state of that object.

like image 21
Code-Apprentice Avatar answered Sep 20 '22 18:09

Code-Apprentice


Not consistent hashCode is wrong.

However I could not understand how it should affect you in your case as you invoke removeIf which iterates all elements of the set.

So I tried it using JAVA 11 and it worked. Set was emptied and size returned was 0 as expected. I am curious of what configurations you use.

   public static void main(String[] args){

        Set<Foo> s = new HashSet<>();
        s.add(new Foo("user1", 3));
        s.add(new Foo("user2", 5));

        s.forEach( e -> System.out.println(e));

        s.removeIf(f-> true);

        s.forEach( e ->System.out.println(e));

        System.out.println(s.size());

    }

enter image description here

like image 38
Panagiotis Bougioukos Avatar answered Sep 22 '22 18:09

Panagiotis Bougioukos