Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using UUIDs for cheap equals() and hashCode()

I have an immutable class, TokenList, which consists of a list of Token objects, which are also immutable:

@Immutable
public final class TokenList {

    private final List<Token> tokens;

    public TokenList(List<Token> tokens) {
        this.tokens = Collections.unmodifiableList(new ArrayList(tokens));
    }

    public List<Token> getTokens() {
        return tokens;
    }
}

I do several operations on these TokenLists that take multiple TokenLists as inputs and return a single TokenList as the output. There can be arbitrarily many TokenLists going in, and each can have arbitrarily many Tokens.

These operations are expensive, and there is a good chance that the same operation (ie the same inputs) will be performed multiple times, so I would like to cache the outputs. However, performance is critical, and I am worried about the expense of performing hashCode() and equals() on these objects that may contain arbitrarily many elements (as they are immutable then hashCode could be cached, but equals will still be expensive).

This led me to wondering whether I could use a UUID to provide equals() and hashCode() simply and cheaply by making the following updates to TokenList:

@Immutable
public final class TokenList {

    private final List<Token> tokens;
    private final UUID uuid;

    public TokenList(List<Token> tokens) {
        this.tokens = Collections.unmodifiableList(new ArrayList(tokens));
        this.uuid = UUID.randomUUID();
    }

    public List<Token> getTokens() {
        return tokens;
    }

    public UUID getUuid() {
        return uuid;
    }
}

And something like this to act as a cache key:

@Immutable
public final class TopicListCacheKey {

    private final UUID[] uuids;

    public TopicListCacheKey(TopicList... topicLists) {
        uuids = new UUID[topicLists.length];
        for (int i = 0; i < uuids.length; i++) {
            uuids[i] = topicLists[i].getUuid();
        }
    }

    @Override
    public int hashCode() {
        return Arrays.hashCode(uuids);
    }

    @Override
    public boolean equals(Object other) {
        if (other == this) return true;
        if (other instanceof TopicListCacheKey)
            return Arrays.equals(uuids, ((TopicListCacheKey) other).uuids);
        return false;
    }
}

I figure that there are 2^128 different UUIDs and I will probably have at most around 1,000,000 TokenList objects active in the application at any time. Given this, and the fact that the UUIDs are used combinatorially in cache keys, it seems that the chances of this producing the wrong result are vanishingly small. Nevertheless, I feel uneasy about going ahead with it as it just feels 'dirty'. Are there any reasons I should not use this system? Will the performance costs of the SecureRandom used by UUID.randomUUID() outweigh the gains (especially since I expect multiple threads to be doing this at the same time)? Are collisions going to be more likely than I think? Basically, is there anything wrong with doing it this way??

Thanks.

like image 554
Tom McIntyre Avatar asked Nov 15 '12 10:11

Tom McIntyre


1 Answers

What you are trying is very tricky and needs detailed analysis. So you need to check below questions before deciding on any approach.

These operations are expensive, and there is a good chance that the same operation (ie the same inputs) will be performed multiple times

1) When you say "Same input" in the above line, what do you mean exactly? Does this mean, exact same object i.e. one object referred through several references (same memory location) or does it mean memory-wise separate objects but having logically the same data??

Here if the object is same i.e. same memory location, then == comparison would do fine. For this you have to keep object reference as key in cache.

But if it's the second case i.e. memory-wise separate objects but logically same, then I don't think UUID will help you. Because you have to make sure that such 2 separate objects will get same UUID. THis won't be much easy as anyways you have to go through whole TokenList data to make sure of this

2) Using hashcode in cache, is it safe? I suggest not to use hashcode as key because even though 2 objects are different, they may have the same hashcode. So your logic may go horribly wrong.

So get answers for these questions clear first & only then think about approach.

like image 61
Ravi K Avatar answered Oct 05 '22 06:10

Ravi K