MongoDB 2.4 has a new feature that i see everyone around is talking about, so here is my question (sorry if they are simple):
Basically, a hash index is an array of N buckets or slots, each one containing a pointer to a row. Hash indexes use a hash function F(K, N) in which given a key K and the number of buckets N , the function maps the key to the corresponding bucket of the hash index.
MongoDB hashed indexes truncate floating point numbers to 64-bit integers before hashing. For example, a hashed index would store the same value for a field that held a value of 2.3 , 2.2 , and 2.9 .
The basic idea behind hashing is to distribute key/value pairs across an array of placeholders or "buckets" in the hash table. Using this method, hash is independent of the size of the hash table.
The idea is that you can create a hashed index on a field you want to use as the shard key that happens to give bad write distribution (for example, it's monotonically increasing and would create a hotspot on recent entries).
The hash stored in the hashed index is 64 bits of the 128 bit md5 hash. The goal is to allow sharding by the hash value of the key without the application needing to know about the hashing mechanism.
You can find more information on this here: http://docs.mongodb.org/manual/core/sharded-cluster-internals/#sharding-hashed-shard-key-internals
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With