Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AbstractSet.removeAll() method not working properly

The code shown below does output:

[b]

[a, b]

However I would expect it to print two identical lines in the output.

import java.util.*;

public class Test{
    static void test(String... abc) {
        Set<String> s = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
        s.addAll(Arrays.asList("a", "b"));
        s.removeAll(Arrays.asList(abc));
        System.out.println(s);
    }

    public static void main(String[] args) {
        test("A");
        test("A", "C");
    }
}

The spec clearly states that removeAll

"Removes all this collection's elements that are also contained in the specified collection."

So from my understanding current behavior is unpredictable . Please help me understand this

like image 220
Sachin Sachdeva Avatar asked Mar 30 '17 15:03

Sachin Sachdeva


2 Answers

You only read documentation partly. You forgot one important paragraph from TreeSet:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

Now removeAll implementation comes from AbstractSet and utilizes equals method. According to your code you will have that "a".equals("A") is not true so that elements are not considered equal even if you provided a comparator which manages them when used in the TreeSet itself. If you try with a wrapper then the problem goes away:

import java.util.*;
import java.lang.*;

class Test
{
    static class StringWrapper implements Comparable<StringWrapper>
    {
      public final String string;

      public StringWrapper(String string)
      {
        this.string = string;
      }

      @Override public boolean equals(Object o)
      { 
        return o instanceof StringWrapper && 
            ((StringWrapper)o).string.compareToIgnoreCase(string) == 0; 
      }

      @Override public int compareTo(StringWrapper other) { 
        return string.compareToIgnoreCase(other.string); 
      }

      @Override public String toString() { return string; }
    }

    static void test(StringWrapper... abc) 
    {
        Set<StringWrapper> s = new TreeSet<>();
        s.addAll(Arrays.asList(new StringWrapper("a"), new StringWrapper("b")));
        s.removeAll(Arrays.asList(abc));
        System.out.println(s);
    }

    public static void main(String[] args)
    {
        test(new StringWrapper("A"));
        test(new StringWrapper("A"), new StringWrapper("C"));
    }
}

This because you are now providing a consistent implementation between equals and compareTo of your object so you never have incoherent behavior between how the objects are added inside the sorted set and how all the abstract behavior of the set uses them.

This is true in general, a sort of rule of three for Java code: if you implement compareTo or equals or hashCode you should always implement all of them to avoid problems with standard collections (even if hashCode is less crucial unless you are using these objects in any hashed collection). This is specified many times around java documentation.

like image 184
Jack Avatar answered Oct 27 '22 05:10

Jack


This is an inconsistency in the implementation of TreeSet<E>, bordering on the bug. The code will ignore custom comparator when the number of items in the collection that you pass to removeAll is greater than or equal to the number of items in the set.

The inconsistency is caused by a small optimization: if you look at the implementation of removeAll, which is inherited from AbstractSet, the optimization goes as follows:

public boolean removeAll(Collection<?> c) {
    boolean modified = false;

    if (size() > c.size()) {
        for (Iterator<?> i = c.iterator(); i.hasNext(); )
            modified |= remove(i.next());
    } else {
        for (Iterator<?> i = iterator(); i.hasNext(); ) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }
    return modified;
}

you can see that the behavior is different when c has fewer items than this set (top branch) vs. when it has as many or more items (bottom branch).

Top branch uses the comparator associated with this set, while the bottom branch uses equals for comparison c.contains(i.next()) - all in the same method!

You can demonstrate this behavior by adding a few extra elements to the original tree set:

s.addAll(Arrays.asList("x", "z", "a", "b"));

Now the output for both test cases becomes identical, because remove(i.next()) utilizes the comparator of the set.

like image 28
Sergey Kalinichenko Avatar answered Oct 27 '22 03:10

Sergey Kalinichenko