Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sharding GridFS on MongoDB

Tags:

I'm documenting about the GridFS and the possibility to shard it among different machines.

Reading the documentation here, the suggested shard key is chunks.files_id. This key will be linked to the _id of the files collection, thus this _id is incremental. Every new file i save in the Grid will have a new incremental _id.

In the O'Reilly "Scaling MongoDB" book the use of an incremental shard key is discouraged to avoid HotSpots (the last shard will receive all the write and read).

what is your suggestion for sharding the GridFS collection?
have anybody experienced the HotSpot problem?

thank you.

like image 261
ALoR Avatar asked Mar 17 '11 20:03

ALoR


People also ask

What is GridFS in MongoDB?

GridFS is the MongoDB specification for storing and retrieving large files such as images, audio files, video files, etc. It is kind of a file system to store files but its data is stored within MongoDB collections. GridFS has the capability to store files even greater than its document size limit of 16MB.

How sharding is done in MongoDB?

Sharding is the process of distributing data across multiple hosts. In MongoDB, sharding is achieved by splitting large data sets into small data sets across multiple MongoDB instances.

Why do we need GridFS in MongoDB?

In MongoDB, use GridFS for storing files larger than 16 MB. In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem. If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.

How do I create a GridFS bucket in MongoDB?

You create a GridFSBucket instance by calling its constructor: IMongoDatabase database; var bucket = new GridFSBucket(database);


1 Answers

You should shard on files_id to keep file chunks together, but you are correct that that will create a hotspot. If you can, use something other than ObjectId for _ids in the fs.files collection (probably MD5s would be better than ObjectIds).

We'll be adding hashing for sharding, which will solve this, but not until at least 2.0.

like image 64
kristina Avatar answered Sep 28 '22 08:09

kristina