Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongodb sharding - chunks are not having the same size

I am new on playing with mongodb. Due to the fact that I have to store +-50 mln of documents, I had to set up a mongodb shard cluster with two replica sets

The document looks like this:

{
    "_id" : "predefined_unique_id",
    "appNr" : "abcde",
    "modifiedDate" : ISODate("2016-09-16T13:00:57.000Z"),
    "size" : NumberLong(803),
    "crc32" : NumberLong(538462645)
}

The shard key is appNr (was selected because for query performance reasons, all documents having same appNr have to stay within one chunk). Usually multiple documents have the same appNr.

After loading like two million records, I see the chunks are equally balanced however when running db.my_collection.getShardDistribution(), I get :

Shard rs0 at rs0/...
 data : 733.97MiB docs : 5618348 chunks : 22
 estimated data per chunk : 33.36MiB
 estimated docs per chunk : 255379

Shard rs1 at rs1/...

 data : 210.09MiB docs : 1734181 chunks : 19
 estimated data per chunk : 11.05MiB
 estimated docs per chunk : 91272

Totals
 data : 944.07MiB docs : 7352529 chunks : 41
 Shard rs0 contains 77.74% data, 76.41% docs in cluster, avg obj size on shard : 136B
 Shard rs1 contains 22.25% data, 23.58% docs in cluster, avg obj size on shard : 127B

My question is what settings I should do in order to get the data equally distributed between shards? I would like to understand how the data gets split in chunks. I have defined a ranged shard key and chunk size 264.

like image 970
DariusNica Avatar asked Dec 14 '25 18:12

DariusNica


1 Answers

MongoDB uses the shard key associated to the collection to partition the data into chunks. A chunk consists of a subset of sharded data. Each chunk has a inclusive lower and exclusive upper range based on the shard key.

Diagram of the shard key value space segmented into smaller ranges or chunks. The mongos routes writes to the appropriate chunk based on the shard key value. MongoDB splits chunks when they grows beyond the configured chunk size. Both inserts and updates can trigger a chunk split.

The smallest range a chunk can represent is a single unique shard key value. A chunk that only contains documents with a single shard key value cannot be split.

Chunk Size will have a major impact on the shards.

The default chunk size in MongoDB is 64 megabytes. We can increase or reduce the chunk size. But modification of the chunk size should be done after considering the below items

  1. Small chunks lead to a more even distribution of data at the expense of more frequent migrations. This creates expense at the query routing (mongos) layer.
  2. Large chunks lead to fewer migrations. This is more efficient both from the networking perspective and in terms of internal overhead at the query routing layer. But, these efficiencies come at the expense of a potentially uneven distribution of data.
  3. Chunk size affects the Maximum Number of Documents Per Chunk to Migrate.
  4. Chunk size affects the maximum collection size when sharding an existing collection. Post-sharding, chunk size does not constrain collection size.

By referring these information and your shard key "appNr", this would have happened because of chunk size.

Try resizing the chunk size instead of 264MB(which you have currently) to a lower size and see whether there is a change in the document distribution. But this would be a trial and error approach and it would take considerable amount of time and iterations.

Reference : https://docs.mongodb.com/v3.2/core/sharding-data-partitioning/

Hope it Helps!

like image 96
Clement Amarnath Avatar answered Dec 18 '25 07:12

Clement Amarnath



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!