Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Closed" HashMap in Java/Scala

Very often the performance bottleneck when using hash maps is the equals method. equals can be very expensive for deep data structures.

Note, that the following is about immutable hash maps. Thus, at least you will never remove a key. I think adding keys should be ok.

Unsafe get

Suppose you query a hash map, being certain it contains the queried key. Then if there is no collision for the given key, the found single entry can be returned just based on the hash hit, because it has to be the queried object.

This can avoid calling equals in get in most cases (when there is no collision).

Questions

  1. How is this concept called?
  2. Are there any hash map implementations available for Java or Scala, that support such an unsafe get operation?

Btw, I'm open for suggestions of a better subject line.

like image 213
ziggystar Avatar asked Dec 20 '22 13:12

ziggystar


2 Answers

How is this concept called?

Identity map. It won't call equals() when looking for elements and just use identity (i.e. ==).

The correct solution isn't to "fix" the map but to use only keys which use the default Object.equals():

public boolean equals( Object other ) { return this == other; }

The problem is that finding elements in this map can be problematic unless all keys are singletons. So you can't use String, for example, because Java doesn't guarantee that all string instances are interned. The same is true for Integer: Instances < -128 and > 127 will be different.

But if you use your own optimized implementation for keys, you can solve this.

like image 135
Aaron Digulla Avatar answered Jan 04 '23 11:01

Aaron Digulla


Such a data structure doesn't exist to my knowledge exactly because it's unsafe - there is no way to reliably encode your particular condition.

But, alas, the Scala Collection Library allows one to implement new rich data structures very quickly with a small amount of fresh code. Here's an implementation of your request:

class ParticularHashMap[A, +B] private(buckets: Vector[List[(A, B)]]) extends Map[A, B]{

    def this() = this(Vector.fill(ParticularHashMap.BucketsNo)(List.empty))

    def get(key: A) = {
        val bucket = buckets(bucketIndex(key))

        bucket match {
            case List((_, v)) => Some(v) //YOUR SPECIAL OPTIMIZATION !
            case list => list.find(key == _._1).map(_._2)
        }
    }

    def iterator = buckets.flatten.iterator

    def -(key: A) = mkWithUpdatedBucket(key, _ filterNot (_._1 == key))

    def +[B1 >: B](kv: (A, B1)) = mkWithUpdatedBucket(kv._1, insert(kv._1, kv._2.asInstanceOf[B], _))

    //if you're wondering why it's Bucket[A, Any] and not Bucket[A, B] it's because of the covariance of B
    private def mkWithUpdatedBucket(key: A, f: Bucket[A, Any] => Bucket[A, Any]) = {
        val index = bucketIndex(key)
        val newBucket = f(buckets(index))
        (new ParticularHashMap[A, Any](buckets.updated(index, newBucket))).asInstanceOf[ParticularHashMap[A, B]]
    }

    private def insert(k: A, v: Any, bucket: List[(A, Any)]): Bucket[A, Any] = bucket match {
        case List() => List((k, v))
        case (k1, v1) :: kvs if k == k1 => (k, v) :: kvs
        case (k1, v1) :: kvs => (k1, v1) :: insert(k, v, kvs)
    }

    private def bucketIndex(key: A) = Math.abs(key.hashCode()) % ParticularHashMap.BucketsNo
}

object ParticularHashMap {
    private val BucketsNo = 256

    private type Bucket[+A, +B] = List[(A, B)]

    def apply[A, B](kvs: (A, B)*) = new ParticularHashMap[A, B] ++ kvs

}

Please note that this is a naive implementation, but you can improve it too suit your needs.

It will work as expected:

val myMap = ParticularHashMap("hello" -> 2, "world" -> 4)
println(myMap.get("hello")) >> Some(2)
println(myMap.get("world")) >> Some(4)
println(myMap.get("world1")) >> None

But, of course, beware that if you don't respect your contract, bad things will happen. Below is an especially crafted example to show the downside:

val evilMap = ParticularHashMap(1 -> 2)
println(evilMap.get(1)) >> Some(2)
println(evilMap.get(257)) >> Some(2) //BUT 257 IS NOT IN THE MAP !!!
like image 27
Marius Danila Avatar answered Jan 04 '23 13:01

Marius Danila