Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Know real chunk sizes in mongodb

I am trying to find the size of all chunks in one of my sharding collection.

I'd like to know the real size, not the hint given to the mongos as a setting which I know I can find with :

use config
db.settings.find({_id : "chunksize"})

I have tried several solutions but the fact that count operation is very slow so this is not easy. Do you know a solution ? (shell, csharp, python, ruby, bash, I don't care)

For now I have tested the following :

db.getSisterDB("config").chunks.find({ns : "mydb.mycollection"}).forEach(function(chunk) {
     db.getSisterDB("mydb").mycollection.find({},{_id : 0, partnerId , 1, id : 1}).min(chunk.min).max(chunk.max).count()
})

but this is too slow, I am under the impression that it does not use the index on my shard key (which is on {partnerId : 1, id : 1}).

I have also replaced count by explain without any luck. I have also replaced the count with a javascript forEach to manually count (trying to have a indexOnly query that would not hit disk).

I am trying to find the real size because I have seen several chunks that are far above the chunksize given as a hint (2Gb instead of 64Mb).

like image 497
kamaradclimber Avatar asked Sep 11 '12 15:09

kamaradclimber


People also ask

What is chunk size in MongoDB?

The default chunk size in MongoDB is 128 megabytes. You can increase or reduce the chunk size. Consider the implications of changing the default chunk size: Small chunks lead to a more even distribution of data at the expense of more frequent migrations.

Where can I find jumbo chunks in MongoDB?

Run sh. status(true) to find the chunk labeled jumbo .

What is the default size of a chunk?

The default chunk size for a sharded cluster is 128 megabytes. This default chunk size works well for most deployments; however, if you notice that automatic migrations have more I/O than your hardware can handle, you may want to reduce the chunk size.

How do I know if sharding is enabled?

The _id displays the name of the database. The partitioned displays whether the database has sharding enabled. If true , the database has sharding enabled.


2 Answers

I think the command that would help you out the most is the datasize command. There is still a caveat here that the command will take longer to run in larger sized collections, so your mileage may vary.

Given that, you could try something similar to the following:

var ns = "mydb.mycollection" //the full namespace of the collection
var key = {partnerId : 1, id : 1} //the shard key of the collection

db.getSiblingDB("config").chunks.find({ns : ns}).forEach(function(chunk) {
        var ds = db.getSiblingDB(ns.split(".")[0]).runCommand({datasize:chunk.ns,keyPattern:key,min:chunk.min,max:chunk.max});
        print("Chunk: "+chunk._id +" has a size of "+ds.size+", and includes "+ds.numObjects+" objects (took "+ds.millis+"ms)")
    }
)
like image 56
Andre de Frere Avatar answered Oct 07 '22 09:10

Andre de Frere


After some tries, there is no easier way than using a count in version <2.2 The following is the script I use with my compound shard key (partnerId, id).

var collection = "products";
var database = "products";
var ns =database+"."+collection;
rs.slaveOk(true)
db.getSiblingDB("config").chunks.find({ns : ns}).forEach(function(chunk) {
  pMin = chunk.min.partnerId
  pMax = chunk.max.partnerId
  midR = {partnerId : {$gt : pMin , $lt : pMax}}
  lowR = {partnerId  : pMin,  id : {$gte : chunk.min.id}}
  if (pMin == pMax) lowR.id = {$gte : chunk.min.id, $lt : chunk.max.id}
  upR  = {partnerId  : pMax,  id : {$lt : chunk.max.id}}
  a = db.getSiblingDB(database).runCommand({count : collection, query : lowR, fields :    {partnerId :1, _id : 0}}).n 
  b = db.getSiblingDB(database).runCommand({count : collection, query : midR, fields :    {partnerId :1, _id : 0}}).n 
  c=0
  if (pMin != pMax)
    c = db.getSiblingDB(database).runCommand({count : collection, query : upR, fields :    {partnerId :1, _id : 0}}).n 
  print(chunk.shard + "|"+tojson(chunk.min) +"|" +tojson(chunk.max)+"|"+a +"|"+b+"|"+ c     +"|"+(a+b+c))
  })
like image 31
kamaradclimber Avatar answered Oct 07 '22 09:10

kamaradclimber