Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB sharded collection not rebalancing

We have a relatively simple sharded MongoDB setup: 4 shards with each shard being a replica set that has at least 3 members. Each collection consists of data loaded from a lot of files; each file is given a monotonically increasing ID and the sharding is done based on a hash of the ID.

Most of our collections are working as expected. However I have one collection that does not seem to be properly distributing the chunks across shards. The collection had ~30GB of data loaded before the index was created and it was sharded, however this shouldn't matter as far as I'm aware. Here are the stats for the collection:

mongos> db.mycollection.stats()
{
        "sharded" : true,
        "ns" : "prod.mycollection",
        "count" : 53304954,
        "numExtents" : 37,
        "size" : 35871987376,
        "storageSize" : 38563958544,
        "totalIndexSize" : 8955712416,
        "indexSizes" : {
                "_id_" : 1581720784,
                "customer_code_1" : 1293148864,
                "job_id_1_customer_code_1" : 1800853936,
                "job_id_hashed" : 3365576816,
                "network_code_1" : 914412016
        },
        "avgObjSize" : 672.9578525853339,
        "nindexes" : 5,
        "nchunks" : 105,
        "shards" : {
                "rs0" : {
                        "ns" : "prod.mycollection",
                        "count" : 53304954,
                        "size" : 35871987376,
                        "avgObjSize" : 672.9578525853339,
                        "storageSize" : 38563958544,
                        "numExtents" : 37,
                        "nindexes" : 5,
                        "lastExtentSize" : 2146426864,
                        "paddingFactor" : 1.0000000000050822,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 8955712416,
                        "indexSizes" : {
                                "_id_" : 1581720784,
                                "job_id_1_customer_code_1" : 1800853936,
                                "customer_code_1" : 1293148864,
                                "network_code_1" : 914412016,
                                "job_id_hashed" : 3365576816
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}

And the sh.status() for this collection:

            prod.mycollection
                    shard key: { "job_id" : "hashed" }
                    chunks:
                            rs0     105
                    too many chunks to print, use verbose if you want to force print

Is there something I'm missing as to why this collection will only distribute to rs0? Is there a way to force a rebalance? I performed the same steps to shard other collections and they properly distributed themselves. Here are the stats for a collection that successfully sharded:

mongos> db.myshardedcollection.stats()
{
        "sharded" : true,
        "ns" : "prod.myshardedcollection",
        "count" : 5112395,
        "numExtents" : 71,
        "size" : 4004895600,
        "storageSize" : 8009994240,
        "totalIndexSize" : 881577200,
        "indexSizes" : {
                "_id_" : 250700688,
                "customer_code_1" : 126278320,
                "job_id_1_customer_code_1" : 257445888,
                "job_id_hashed" : 247152304
        },
        "avgObjSize" : 783.3697513591966,
        "nindexes" : 4,
        "nchunks" : 102,
        "shards" : {
                "rs0" : {
                        "ns" : "prod.myshardedcollection",
                        "count" : 1284540,
                        "size" : 969459424,
                        "avgObjSize" : 754.7133012595949,
                        "storageSize" : 4707762176,
                        "numExtents" : 21,
                        "nindexes" : 4,
                        "lastExtentSize" : 1229475840,
                        "paddingFactor" : 1.0000000000000746,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 190549856,
                        "indexSizes" : {
                                "_id_" : 37928464,
                                "job_id_1_customer_code_1" : 39825296,
                                "customer_code_1" : 33734176,
                                "job_id_hashed" : 79061920
                        },
                        "ok" : 1
                },
                "rs1" : {
                        "ns" : "prod.myshardedcollection",
                        "count" : 1287243,
                        "size" : 1035438960,
                        "avgObjSize" : 804.384999568846,
                        "storageSize" : 1178923008,
                        "numExtents" : 17,
                        "nindexes" : 4,
                        "lastExtentSize" : 313208832,
                        "paddingFactor" : 1,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 222681536,
                        "indexSizes" : {
                                "_id_" : 67787216,
                                "job_id_1_customer_code_1" : 67345712,
                                "customer_code_1" : 30169440,
                                "job_id_hashed" : 57379168
                        },
                        "ok" : 1
                },
                "rs2" : {
                        "ns" : "prod.myshardedcollection",
                        "count" : 1131411,
                        "size" : 912549232,
                        "avgObjSize" : 806.5585644827565,
                        "storageSize" : 944386048,
                        "numExtents" : 16,
                        "nindexes" : 4,
                        "lastExtentSize" : 253087744,
                        "paddingFactor" : 1,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 213009328,
                        "indexSizes" : {
                                "_id_" : 64999200,
                                "job_id_1_customer_code_1" : 67836272,
                                "customer_code_1" : 26522944,
                                "job_id_hashed" : 53650912
                        },
                        "ok" : 1
                },
                "rs3" : {
                        "ns" : "prod.myshardedcollection",
                        "count" : 1409201,
                        "size" : 1087447984,
                        "avgObjSize" : 771.6769885914075,
                        "storageSize" : 1178923008,
                        "numExtents" : 17,
                        "nindexes" : 4,
                        "lastExtentSize" : 313208832,
                        "paddingFactor" : 1,
                        "systemFlags" : 0,
                        "userFlags" : 0,
                        "totalIndexSize" : 255336480,
                        "indexSizes" : {
                                "_id_" : 79985808,
                                "job_id_1_customer_code_1" : 82438608,
                                "customer_code_1" : 35851760,
                                "job_id_hashed" : 57060304
                        },
                        "ok" : 1
                }
        },
        "ok" : 1
}

sh.status() for the properly sharded collection:

            prod.myshardedcollection
                    shard key: { "job_id" : "hashed" }
                    chunks:
                            rs2     25
                            rs1     26
                            rs3     25
                            rs0     26
                    too many chunks to print, use verbose if you want to force print
like image 496
Mason Avatar asked Jun 04 '13 02:06

Mason


1 Answers

In MongoDB when you go to a sharded system and you don't see any balancing it could one of several things.

  1. You may not have enough data to trigger balancing. That was definitely not your situation but some people may not realize that with default chunk size of 64MB it might take a while of inserting data before there is enough to split and balance some of it to other chunks.

  2. The balancer may not have been running - since your other collections were getting balanced that was unlikely in your case unless this collection was sharded last after the balancer was stopped for some reason.

  3. The chunks in your collection can't be moved. This can happen when the shard key is not granular enough to split the data into small enough chunks. As it turns out this was your case because your shard key turned out not to be granular enough for this large a collection - you have 105 chunks (which probably corresponds to the number of unique job_id values) and over 30GB of data. When the chunks are too large and the balancer can't move them it tags them as "jumbo" (so it won't spin its wheels trying to migrate them).

How to recover from a poor choice of a shard key? Normally it's very painful to change the shard key - since shard key is immutable you have to do an equivalent of a full data migration to get it into a collection with another shard key. However, in your case the collection is all on one shard still, so it should be relatively easy to "unshard" the collection and reshard it with a new shard key. Because the number of job_ids is relatively small I would recommend using a regular index to shard on job_id,customer_code since you probably query on that and I'm guessing it's always set at document creation time.

like image 167
Asya Kamsky Avatar answered Oct 07 '22 14:10

Asya Kamsky