Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I define a good hashCode for a circular linked list in Java?

I have set up a circular linked list data structure that represents a word, and each element in the list is a letter from the word. At the bottom of my question are the class definitions of the list and element of the list.

The purpose of the list data structure is to be able to compare cyclic words. So... "picture" and "turepic" are the same cyclic word, so the two lists will be equal.

So I override equals() when comparing two lists, and I've read that whenever you have to override equals(), you have to also override hashCode(). However, I don't really have a good idea of how to do that.

How should I define a good hashCode for what I have set up? What things should I consider? In the example of the "picture" and "turepic", the two lists are equal so their hashCode needs to be the same. Any ideas?

Thanks, Hristo

public class Letter {
 char value;
 Letter theNextNode;

 /**
  * Default constructor for an element of the list.
  * 
  * @param theCharacter - the value for this node.
  */
 Letter(char theCharacter) {
  this.value = theCharacter;
 }
}


public class CircularWord {

 /*
  * Class Variables
  */
 Letter head;
 Letter tail;
 Letter theCurrentNode;

 int iNumberOfElements;


 /**
  * Default Constructor. All characters that make up 'theWord' are stored in a 
  * circular linked list structure where the tail's NEXT is the head. 
  */
 public CircularWord(String theWord) {

  char[] theCharacters = theWord.toCharArray();

  for (int iIndex = 0; iIndex < theCharacters.length; iIndex++) {
   this.addElement(theCharacters[iIndex]);
  }

  this.theCurrentNode = head;
  this.iNumberOfElements = theCharacters.length;
 }
}
like image 912
Hristo Avatar asked Sep 19 '10 19:09

Hristo


1 Answers

So you want a hashcode calculation which gives equal results for "picture" and "turepic", but (preferably) different from the hashcode of e.g. "eruptic". Thus it is not enough to simply add up the hashcodes of the letters contained in the word - you need to have some position information too, but still, it should be independent of the actual permutation of the word. You need to define "equivalence classes", and always calculate the same hashcode for each member of the class.

The easiest way to achieve this is to select a specific member of the equivalence class and always use the hashcode of that variation for all equivalent words. E.g. select the first variant alphabetically (thanks @Michael for summing it up concisely). For "picture" et al., that would be "cturepi". Both "picture" and "turepic" (and all the other equivalent variations) should return the hash code of "cturepi". That hash code could be calculated by the standard LinkedList method, or any other preferred way.

One might say that this calculation is very expensive. True, however one could cache the result, so that only the first calculation would be costly. And I guess the selection of the first alphabetical variant could be optimized fairly much in the common case (compared to the trivial solution of generating all permutations in the specific equivalence class, then sorting them and picking the first).

E.g. in many of the words, the first letter alphabetically is unique ("picture" is one of these - its first letter alphabetically is 'c', and there is only one 'c' in it). So you only need to find it, then calculate the hashcode starting from there. If it is not unique, you need to compare the second, third, etc. letters after that, until you find a difference (or you roll over).

Update 2 - examples

  • "abracadabra" contains 5 'a's. The 2nd characters after the 'a's are 'b', 'c', 'd', 'b' and 'a', respectively. So in the 2nd round of comparison you can conclude that the lexicographically smallest variation is "aabracadabr".
  • "abab" contains 2 'a's, and a 'b' after each (and then you roll over, reaching an 'a' again, so the quest ends there). So you have two identical lexicographically smallest variations of it. But since they are identical, they obviously produce the same hashcode.

Update: In the end, it all boils down to how much do you actually need the hashcode - i.e. do you plan to put your circular lists into an associative collection like Set or Map. If not, you can do with a simple, or even trivial hash method. But if you use some associative collection heavily, a trivial hash implementation gives you lots of collisions thus suboptimal performance. In this case, it is worth a try implementing this hash method and measuring whether it pays for itself in performance.

Update 3: sample code

Letter is basically left the same as above, I only made the fields private, renamed theNextNode to next, and added getters/setters as needed.

In CircularWord I made some more changes: dropped tail and theCurrentNode, and made the word really circular (i.e. last.next == head). The constructor, toString and equals are not relevant for computing the hashcode, so they are omitted for the sake of simplicity.

public class CircularWord {
    private final Letter head;
    private final int numberOfElements;
    
    // constructor, toString(), equals() omitted

    @Override
    public int hashCode() {
        return hashCodeStartingFrom(getStartOfSmallestRotation());
    }

    private Letter getStartOfSmallestRotation() {
        if (head == null) {
            return null;
        }
        Set<Letter> candidates = allLetters();
        int counter = numberOfElements;

        while (candidates.size() > 1 && counter > 0) {
            candidates = selectSmallestSuccessors(candidates);
            counter--;
        }
        return rollOverToStart(counter, candidates.iterator().next());
    }

    private Set<Letter> allLetters() {
        Set<Letter> letters = new LinkedHashSet<Letter>();
        Letter letter = head;

        for (int i = 0; i < numberOfElements; i++) {
            letters.add(letter);
            letter = letter.getNext();
        }
        return letters;
    }

    private Set<Letter> selectSmallestSuccessors(Set<Letter> candidates) {
        Set<Letter> smallestSuccessors = new LinkedHashSet<Letter>();

        char min = Character.MAX_VALUE;
        for (Letter letter : candidates) {
            Letter nextLetter = letter.getNext();
            if (nextLetter.getValue() < min) {
                min = nextLetter.getValue();
                smallestSuccessors.clear();
            }
            if (nextLetter.getValue() == min) {
                smallestSuccessors.add(nextLetter);
            }
        }
        return smallestSuccessors;
    }

    private Letter rollOverToStart(int counter, Letter lastCandidate) {
        for (; counter >= 0; counter--) {
            lastCandidate = lastCandidate.getNext();
        }
        return lastCandidate;
    }

    private int hashCodeStartingFrom(Letter startFrom) {
        int hash = 0;
        Letter letter = startFrom;
        for (int i = 0; i < numberOfElements; i++) {
            hash = 31 * hash + letter.getValue();
            letter = letter.getNext();
        }
        return hash;
    }

}

The algorithm implemented in getStartOfSmallestRotation to find the lexicographically smallest rotation of the word is basically what I describe above: compare and select the lexicographically smallest 1st, 2nd, 3rd etc. letters of each rotation, dropping the greater letters until either there is only one candidate left, or you roll over the word. Since the list is circular, I use a counter to avoid an infinite loop.

In the end, if I have a single candidate left, it may be in the middle of the word and I need to get the start of the smallest word rotation. However, as this is a singly-linked list, it is awkward to step backwards in it. Luckily, the counter nicely helps me out: it has recorded how many letters I have compared so far, but in a circular list this is equivalent to how many letters I can move forward before rolling over. Thus I know how many letters to move forward in order to get again to the beginning of the minimal word rotation I am looking for.

Hope this helps someone - at least it was fun to write :-)

like image 97
11 revs Avatar answered Sep 23 '22 03:09

11 revs