Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deduplication for String intern method in ConcurrentHashMap

I watched a code from JavaDays, author said that this approach with probability is very effective for storing Strings like analogue to String intern method

 public class CHMDeduplicator<T> {
    private final int prob;
    private final Map<T, T> map;

    public CHMDeduplicator(double prob) {
        this.prob = (int) (Integer.MIN_VALUE + prob * (1L << 32));
        this.map = new ConcurrentHashMap<>();
    }

    public T dedup(T t) {
        if (ThreadLocalRandom.current().nextInt() > prob) {
            return t;
        }
        T exist = map.putIfAbsent(t, t);
        return (exist == null) ? t : exist;
    }
}

Please, explain me, what is effect of probability in this line:

if (ThreadLocalRandom.current().nextInt() > prob) return t;

This is original presentation from Java Days https://shipilev.net/talks/jpoint-April2015-string-catechism.pdf (56th slide)

like image 764
pacman Avatar asked Aug 04 '16 13:08

pacman


1 Answers

If you look at the next slide which has a table with data with different probabilities, or listen to the talk, you will see/hear the rationale: probabilistic deduplicators balance the time spent deduplicating the Strings, and the memory savings coming from the deduplication. This allows to fine-tune the time spent processing Strings, or even sprinkle the low-prob deduplicators around the code thus amortizing the deduplication costs.

(Source: these are my slides)

like image 81
Aleksey Shipilev Avatar answered Oct 08 '22 09:10

Aleksey Shipilev