I have set up a circular linked list data structure that represents a word, and each element in the list is a letter from the word. At the bottom of my question are the class definitions of the list and element of the list.
The purpose of the list data structure is to be able to compare cyclic words. So... "picture" and "turepic" are the same cyclic word, so the two lists will be equal.
So I override equals()
when comparing two lists, and I've read that whenever you have to override equals()
, you have to also override hashCode()
. However, I don't really have a good idea of how to do that.
How should I define a good hashCode for what I have set up? What things should I consider? In the example of the "picture" and "turepic", the two lists are equal so their hashCode needs to be the same. Any ideas?
Thanks, Hristo
public class Letter {
char value;
Letter theNextNode;
/**
* Default constructor for an element of the list.
*
* @param theCharacter - the value for this node.
*/
Letter(char theCharacter) {
this.value = theCharacter;
}
}
public class CircularWord {
/*
* Class Variables
*/
Letter head;
Letter tail;
Letter theCurrentNode;
int iNumberOfElements;
/**
* Default Constructor. All characters that make up 'theWord' are stored in a
* circular linked list structure where the tail's NEXT is the head.
*/
public CircularWord(String theWord) {
char[] theCharacters = theWord.toCharArray();
for (int iIndex = 0; iIndex < theCharacters.length; iIndex++) {
this.addElement(theCharacters[iIndex]);
}
this.theCurrentNode = head;
this.iNumberOfElements = theCharacters.length;
}
}
So you want a hashcode calculation which gives equal results for "picture" and "turepic", but (preferably) different from the hashcode of e.g. "eruptic". Thus it is not enough to simply add up the hashcodes of the letters contained in the word - you need to have some position information too, but still, it should be independent of the actual permutation of the word. You need to define "equivalence classes", and always calculate the same hashcode for each member of the class.
The easiest way to achieve this is to select a specific member of the equivalence class and always use the hashcode of that variation for all equivalent words. E.g. select the first variant alphabetically (thanks @Michael for summing it up concisely). For "picture" et al., that would be "cturepi". Both "picture" and "turepic" (and all the other equivalent variations) should return the hash code of "cturepi". That hash code could be calculated by the standard LinkedList method, or any other preferred way.
One might say that this calculation is very expensive. True, however one could cache the result, so that only the first calculation would be costly. And I guess the selection of the first alphabetical variant could be optimized fairly much in the common case (compared to the trivial solution of generating all permutations in the specific equivalence class, then sorting them and picking the first).
E.g. in many of the words, the first letter alphabetically is unique ("picture" is one of these - its first letter alphabetically is 'c', and there is only one 'c' in it). So you only need to find it, then calculate the hashcode starting from there. If it is not unique, you need to compare the second, third, etc. letters after that, until you find a difference (or you roll over).
Update 2 - examples
Update: In the end, it all boils down to how much do you actually need the hashcode - i.e. do you plan to put your circular lists into an associative collection like Set
or Map
. If not, you can do with a simple, or even trivial hash method. But if you use some associative collection heavily, a trivial hash implementation gives you lots of collisions thus suboptimal performance. In this case, it is worth a try implementing this hash method and measuring whether it pays for itself in performance.
Letter
is basically left the same as above, I only made the fields private
, renamed theNextNode
to next
, and added getters/setters as needed.
In CircularWord
I made some more changes: dropped tail
and theCurrentNode
, and made the word really circular (i.e. last.next == head
). The constructor, toString
and equals
are not relevant for computing the hashcode, so they are omitted for the sake of simplicity.
public class CircularWord {
private final Letter head;
private final int numberOfElements;
// constructor, toString(), equals() omitted
@Override
public int hashCode() {
return hashCodeStartingFrom(getStartOfSmallestRotation());
}
private Letter getStartOfSmallestRotation() {
if (head == null) {
return null;
}
Set<Letter> candidates = allLetters();
int counter = numberOfElements;
while (candidates.size() > 1 && counter > 0) {
candidates = selectSmallestSuccessors(candidates);
counter--;
}
return rollOverToStart(counter, candidates.iterator().next());
}
private Set<Letter> allLetters() {
Set<Letter> letters = new LinkedHashSet<Letter>();
Letter letter = head;
for (int i = 0; i < numberOfElements; i++) {
letters.add(letter);
letter = letter.getNext();
}
return letters;
}
private Set<Letter> selectSmallestSuccessors(Set<Letter> candidates) {
Set<Letter> smallestSuccessors = new LinkedHashSet<Letter>();
char min = Character.MAX_VALUE;
for (Letter letter : candidates) {
Letter nextLetter = letter.getNext();
if (nextLetter.getValue() < min) {
min = nextLetter.getValue();
smallestSuccessors.clear();
}
if (nextLetter.getValue() == min) {
smallestSuccessors.add(nextLetter);
}
}
return smallestSuccessors;
}
private Letter rollOverToStart(int counter, Letter lastCandidate) {
for (; counter >= 0; counter--) {
lastCandidate = lastCandidate.getNext();
}
return lastCandidate;
}
private int hashCodeStartingFrom(Letter startFrom) {
int hash = 0;
Letter letter = startFrom;
for (int i = 0; i < numberOfElements; i++) {
hash = 31 * hash + letter.getValue();
letter = letter.getNext();
}
return hash;
}
}
The algorithm implemented in getStartOfSmallestRotation
to find the lexicographically smallest rotation of the word is basically what I describe above: compare and select the lexicographically smallest 1st, 2nd, 3rd etc. letters of each rotation, dropping the greater letters until either there is only one candidate left, or you roll over the word. Since the list is circular, I use a counter to avoid an infinite loop.
In the end, if I have a single candidate left, it may be in the middle of the word and I need to get the start of the smallest word rotation. However, as this is a singly-linked list, it is awkward to step backwards in it. Luckily, the counter nicely helps me out: it has recorded how many letters I have compared so far, but in a circular list this is equivalent to how many letters I can move forward before rolling over. Thus I know how many letters to move forward in order to get again to the beginning of the minimal word rotation I am looking for.
Hope this helps someone - at least it was fun to write :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With