I watched a code from JavaDays, author said that this approach with probability is very effective for storing Strings like analogue to String intern method
public class CHMDeduplicator<T> {
private final int prob;
private final Map<T, T> map;
public CHMDeduplicator(double prob) {
this.prob = (int) (Integer.MIN_VALUE + prob * (1L << 32));
this.map = new ConcurrentHashMap<>();
}
public T dedup(T t) {
if (ThreadLocalRandom.current().nextInt() > prob) {
return t;
}
T exist = map.putIfAbsent(t, t);
return (exist == null) ? t : exist;
}
}
Please, explain me, what is effect of probability in this line:
if (ThreadLocalRandom.current().nextInt() > prob) return t;
This is original presentation from Java Days https://shipilev.net/talks/jpoint-April2015-string-catechism.pdf (56th slide)
If you look at the next slide which has a table with data with different probabilities, or listen to the talk, you will see/hear the rationale: probabilistic deduplicators balance the time spent deduplicating the Strings, and the memory savings coming from the deduplication. This allows to fine-tune the time spent processing Strings, or even sprinkle the low-prob deduplicators around the code thus amortizing the deduplication costs.
(Source: these are my slides)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With