I'm documenting about the GridFS and the possibility to shard it among different machines. Reading the documentation here, the suggested shard key is chunks.files_id. This key will be linked to the _id of the files collection, thus this _id is incremental. Every new file i save in the Grid will have a new incremental _id. In the O'Reilly "Scaling MongoDB" book the use of an incremental shard key is discouraged to avoid HotSpots (the last shard will receive all the write and read). what is your suggestion for sharding the GridFS collection? have anybody experienced the HotSpot problem? thank you.

You should shard on <code>files_id</code> to keep file chunks together, but you are correct that that will create a hotspot. If you can, use something other than ObjectId for <code>_id</code>s in the fs.files collection (probably MD5s would be better than ObjectIds). We'll be adding hashing for sharding, which will solve this, but not until at least 2.0.

Sharding GridFS on MongoDB

Tags:

I'm documenting about the GridFS and the possibility to shard it among different machines.

Reading the documentation here, the suggested shard key is chunks.files_id. This key will be linked to the _id of the files collection, thus this _id is incremental. Every new file i save in the Grid will have a new incremental _id.

In the O'Reilly "Scaling MongoDB" book the use of an incremental shard key is discouraged to avoid HotSpots (the last shard will receive all the write and read).

what is your suggestion for sharding the GridFS collection?
have anybody experienced the HotSpot problem?

thank you.

261

asked Mar 17 '11 20:03

ALoR

1 Answers

You should shard on files_id to keep file chunks together, but you are correct that that will create a hotspot. If you can, use something other than ObjectId for _ids in the fs.files collection (probably MD5s would be better than ObjectIds).

We'll be adding hashing for sharding, which will solve this, but not until at least 2.0.

answered Sep 28 '22 08:09

kristina

Related questions
                            
                                ASP.NET MVC OutputCache vary by * and vary by user cookie
                            
                                What is ForkJoinPool Async mode
                            
                                Why would 'this.ContentTemplate.FindName' throw an InvalidOperationException on its own template?
                            
                                every element of list is True boolean
                            
                                Named Entity Recognition for NLTK in Python. Identifying the NE
                            
                                How to get the wavelength of a pixel using RGB?
                            
                                .NET - Getting all implementations of a generic interface?
                            
                                Difference between <> and != in SQL
                            
                                Insert value into TEXTAREA where cursor was
                            
                                Defining custom URL routes in ASP.Net MVC
                            
                                How to add "irrelevant" edges
                            
                                How to avoid repeated code?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With