Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Proper Implementation of Hashed Shard Key In MongoDB

Tags:

mongodb

I have a collection that is currently indexed/queried by the built-in "_id" (ObjectId). I don't want to shard on this key since it is sequential (date-prefixed). The documentation for Mongo 2.4 says that I can shard on a hash of this key, which sounds great. Like so:

sh.shardCollection( "records.active", { _id: "hashed" } )

Question: do I have to first create the hashed index on the active collection with:

db.active.ensureIndex({ _id: "hashed" })

Or is that not necessary? I don't want to waste space with more indexing than is necessary.

Related question: if I do create a hashed index with ensureIndex({ _id: "hashed"}) can I drop the default "id" index? Will Mongo know to take queries on the _id field, hash them and run them against the hashed index?

Thanks...

like image 891
motormal Avatar asked Mar 28 '13 12:03

motormal


People also ask

What is hashed shard key in MongoDB?

Hashed based sharding uses a hashed index of a field as the shard key to partition data across your sharded cluster. Using a hashed shard key to shard a collection results in a more even distribution of data.

How does shard key work in MongoDB?

MongoDB uses the shard key to distribute a collection's documents across shards. MongoDB splits the data into “chunks”, by dividing the span of shard key values into non-overlapping ranges. MongoDB then attempts to distribute those chunks evenly among the shards in the cluster.

How do I select a shard key in MongoDB?

Starting in MongoDB 4.4, you can use the refineCollectionShardKey command to refine a collection's shard key. The refineCollectionShardKey command adds a suffix field or fields to the existing key to create the new shard key.


1 Answers

Both the _id index and the hashed _id index will be needed. In MongoDB 2.4 you do not have to explicitly call db.active.ensureIndex({ _id: "hashed" }) before sharding your collection, but if you don't the sh.shardCollection( "records.active", { _id: "hashed" } ) will create the hashed index for you.

The _id index is required for replication.

To shard a collection in MongoDB you have to have an index on the shard key. This has not changed in MongoDB 2.4 and the hashed _id index will be required for sharding to work.

like image 179
James Wahlin Avatar answered Oct 15 '22 02:10

James Wahlin