What is a 'canonical representation' of a field meant to be for equals() method (Joshua Bloch)

Tags:

equals

In chapter 3, item 8:

public final class CaseInsensitiveString {
    private final String s;

    public CaseInsensitiveString(String s) {
        if (s == null)
            throw new NullPointerException();
        this.s = s;
    }

    @Override public boolean equals(Object o) {
        return o instanceof CaseInsensitiveString &&
            ((CaseInsensitiveString) o).s.equalsIgnoreCase(s);
    }
    // remainder omitted
}

After describing issues surrounding the equals() method, he goes on to talk about this class in the context of comparing fields.

For some classes, such as CaseInsensitiveString above, field comparisons are more complex than simple equality tests. If this is the case, you may want to store a canonical form of the field, so the equals() method can do cheap exact comparisons on these canonical forms rather than more costly inexact comparisons. This technique is most appropriate for immutable classes; if the object can change, you must keep the canonical form up-to-date.

So my question (and I double-checked what 'canonical' means): what is Bloch talking about? What would the canonical form be? I'm ready to be told that the answer is very simple (presumably otherwise his editor would have told him to add more) but I want to see other people say so.

He also mentions the same thing for hashCode() in the next item 9.

To give it in context, he also discusses a bad version of the equals() method for CaseInsensitiveString:

// Broken - violates symmetry
@Override public boolean equals(Object o) {
    if (o instanceof CaseInsensitiveString)
        return s.equalsIgnoreCase(
            ((CaseInsensitiveString) o).s);
    if (o instanceof String) // one-way interoperability!
        return s.equalsIgnoreCase((String) o);
    return false;
}

821

asked Jun 26 '14 10:06

Adam

2 Answers

You should add another final field and store value s.toUpperCase() for it. This new field will be canonical representation s field. New implementation of method equals() (see code bellow) will be cheaper. This approach will work only for immutable classes.

Another point you should not forget override hashCode() if you override equals().

public final class CaseInsensitiveString {

  private final String s;
  private final String sForEquals; //field added for simplifier equals method

  public CaseInsensitiveString(String s) {
      if (s == null) {
          throw new IllegalArgumentException(); //NullPointerException() - bad practice
      }
      this.s = s;
      this.sForEquals = s.toUpperCase();
  }

  @Override
  public boolean equals(Object o) {
      return o instanceof CaseInsensitiveString &&
          ((CaseInsensitiveString) o).sForEquals.equals(this.sForEquals);
  }

  @Override
  public int hashCode(){
      return sForEquals.hashCode();
  }
  // remainder omitted
}

124

answered Oct 19 '22 23:10

Sergey Morozov

The term canonical has some different usages. It refers to values that have several representations (or maybe several varying values that are equal). Then often one specific representation (or value) is chosen as canonical one.

Example: Sets of integers: canonical { 2, 3, 5 } = { 3, 5, 2 } = { 2, 2, 5, 3 } = .... .

For the plain java String there is as issue too. The same text in Unicode can be represented differently: ĉ either as one code point "\u0109"SMALL-LETTER-C-WITH-CIRCUMFLEX, or as two code points c SMALL-LETTER-C and a zero-width ^ COMBINED-DIACRITICAL-MARK-CIRCUMFLEX ("\u0063\u0302").

So even a plain String should be canonicalized in some cases:

String s = "...";
String s1 = Normalizer.normalize(s, Normalizer.Form.NFKD);

This uses Normalizer to decompose a string. This has the advantage, that one could sort and "c" and "ĉ" stay together. One could remove the combining diacritical marks with a regex and would have an ASCII version.

In fact different operating systems handle Unicode names differently, and not always version control systems respect a cross-platform canonicalisation.

Only after a Normalizer.normalize a comparison with String.equals indeed indicates Unicode text equality.

answered Oct 19 '22 21:10

Joop Eggen

Related questions
                            
                                What does division by 1e9d mean?
                            
                                Make Selenium Webdriver Stop Loading the page if the desired element is already loaded?
                            
                                How to get the substring that contains the first N unicode characters in Java
                            
                                How to watch for events on the descendant nodes in ZooKeeper using curator?
                            
                                When Fragment.instantiate is preferable over MyFragment.newInstance or new MyFragment() [duplicate]
                            
                                Hibernate does not load one to many relationships sets even with eager fetch
                            
                                How to add picture out of JFrame, attached to border
                            
                                How to include dependencies in android library gradle project?
                            
                                Multi-line block select using the keyboard in IntelliJ IDEA on a Mac OS X
                            
                                Is the order of HashMap elements reproducible?
                            
                                explain the way to access inner class in java? [duplicate]
                            
                                Is there any solution can send message to a group of user (not all user) using spring4 websocket?
                            
                                Find a Maven Dependency or Repo from an Import Statement in Java Code
                            
                                Is there a way to turn an existing interface into a functional interface?
                            
                                Null request body not getting caught by Spring @RequestBody @Valid annotations
                            
                                How to Change log level for particular users/threads at runtime
                            
                                Edge detection using OpenCV (Canny)
                            
                                When to use X509EncodedKeySpec vs RSAPublicKeySpec?
                            
                                'texture2D' : No matching overloaded function found OpenGL ES2?
                            
                                Hibernate postgresql notify functionality

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With