Sharding by ObjectID, is it the right way?

Tags:

mongodb

I'm just like many others is thinking about correct approach to shard my collections in Mongo. Main question is - how does auto-sharding work?

The official doc says - "MongoDB scales horizontally via an auto-sharding (partitioning) architecture" and "To partition a collection, we specify a shard key pattern." with note "It is important to choose the right shard key for a collection" :).
http://www.mongodb.org/display/DOCS/Sharding+Introduction#ShardingIntroduction-ShardKeys
http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key

Now the question is - "is this right key"(sharding by ObjectID)?

db.runCommand({ shardcollection : "test", key : { _id : 1 }})

What happens internally in Mongo for ? How Mongo will split data to chunks in this case? Assuming i initially have 10mln of records with 2 shard servers - what happens on Mongo side when I'd like to add 2 more shard server when collection reaches 20mln records? I could not find that level of details anywhere on Mongo-related sources.

Taking into account random nature of autogenerated _id and it's structure,

... http://www.mongodb.org/display/DOCS/Object+IDs ...

i would shard by the least significant byte (rtl order) with chunks split by value of 2-3 bytes - this would provide easy way to shard by 2^N of shard servers - 2, 4, 8, .., 256 shard servers with more-or-less even load on each shard and with minimal required configuration. As far as i understand Mongo supports only sharding/chunking by explicitly defined ranges and that my idea will not work. Is is true?

547

asked Feb 06 '12 17:02

Xtra Coder

1 Answers

It's generally not a good idea to use the default object id as the shard key since it has an embedded timestamp and monotonically increases in time. This may work fine if you do a lot of updates such that it touches old and new documents in an evenly distributed fashion. However, this is really bad news if your application is heavy on inserts since majority of your writes will go to a single shard. This is because the writes will go to the shard that owns the [nearCurrentTimestamp -> infinity] chunk.

Each mongos monitors write traffic to shards and use a very simple heuristic to determine if a chunk has become too big and needs to be split (threshold size is configurable via chunkSize).

When you add a new shard to the cluster, the balancer (http://www.mongodb.org/display/DOCS/Sharding+Administration#ShardingAdministration-Balancing) will see a chunk imbalance and will start migrating chunks to the new shards.

Mongo supports range based sharding, however, that does not mean that the ranges are fixed since chunks can be split into smaller ranges and moved around the cluster over time.

190

answered Oct 23 '22 11:10

Ren

Related questions
                            
                                ConfigurationError: Server at 127.0.0.1:27017 reports wire version 0, but this version of PyMongo requires at least 2 (MongoDB 2.6)
                            
                                Error: querySrv ENODATA _mongodb._tcp.blog-cluster-0hb5z.mongodb.net at QueryReqWrap.onresolve [as oncomplete] [closed]
                            
                                How to pass ObjectId from MongoDB in MVC.net
                            
                                What's the most efficient document-oriented database engine to store thousands of medium sized documents?
                            
                                Store file in Mongo's GridFS with ExpressJS after upload
                            
                                Mongodb's "mongodump" command, javascript execution error
                            
                                How should I store salts and passwords in MongoDB
                            
                                How to get all result if unwind field does not exist in mongodb
                            
                                (node:71307) [DEP0079] DeprecationWarning
                            
                                Print document value in mongodb shell
                            
                                MongoDB find Query comparision with CurrentDate
                            
                                Using SpringFrameWork @Async for methods that return void
                            
                                Example for transactions in mongodb with GoLang
                            
                                mongo-go-driver find a document by _id
                            
                                C# driver for MongoDb: how to use limit+count?
                            
                                Selecting particular fields in MongooseJs
                            
                                Convert to date MongoDB via mongoimport
                            
                                $elemMatch equivalent in spring data mongodb
                            
                                How to start mongo db on windows
                            
                                Cannot connect to MongoDB via node.js in Docker

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With