Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does allocation strategy usePowerOf2Sizes work?

It seems that the allocation strategy usePowerOf2Sizes has no effect on the padding factor of the collection. Is there something I am missing or is this a bug? Is there an example that demonstrates the effect of usePowerOf2Sizes on the padding factor?

I have tried the following experiment:

  1. Insert several documents to mongodb.
  2. Randomly pick several documents, change their size, and save them.
  3. Check the padding factor of the collection.
  4. Repeat the steps 2 and 3 several times, observing the padding factor.

I expected that the resulting padding factor of this collection would be smaller for the exact fit allocation strategy than for the power of 2 sizes allocation strategy.

What I really observed, however, was the same padding factor regardless of the used allocation strategy.

This is the code I used (for mongo version 2.6.4):

function randomChoice(arr) {
    return arr[getRandomInt(0, arr.length)]
}

function getRandomInt(min, max) {
  return Math.floor(Math.random() * (max - min)) + min;
}

function prepare(count, usePowerOf2Sizes) {
    db.aaa.drop();
    db.aaa.createIndex({"i": 1})

    for (var i=0; i<count; i++) {
        db.aaa.insert({"i": i});
    }
    db.runCommand({"collMod": "aaa", "usePowerOf2Sizes" : usePowerOf2Sizes })
}

function changeSizes(actionCount) {
    vals = [100, 150, 225, 340, 500, 760, 1200].map(function (n) {return Array(n + 1).join("a")})

    count = db.aaa.count()

    for (var i=0; i<actionCount; i++) {
        d = db.aaa.find({"i": getRandomInt(0, count)})[0];
        d["a"] = randomChoice(vals);
        db.aaa.save(d);
    }
}

function runPaddingTest(documentCount, usePowerOf2Sizes, actionCount, iterations) {
    prepare(documentCount, usePowerOf2Sizes);
    print("Use padding:", db.aaa.stats().userFlags)
    print("Padding:", db.aaa.stats().paddingFactor)
    d1 = new Date().getTime();
    for(var i=0; i<iterations; i++) {
        changeSizes(actionCount);
        print("Padding:", db.aaa.stats().paddingFactor)
    }
    d2 = new Date().getTime();
    print("Millis:", d2-d1)
}

Example outputs:

// 100 documents, power of 2 sizes allocation strategy, 7*300 document changes
> runPaddingTest(100, true, 300, 7);
Use padding: 1
Padding: 1
Padding: 1.1840000000000002
Padding: 1.1800000000000068
Padding: 1.1360000000000148
Padding: 1.0660000000000238
Padding: 1.0050000000000323
Padding: 1
Padding: 1
Millis: 2661

// 100 documents, exact allocation strategy, 7*300 document changes
> runPaddingTest(100, false, 300, 7);
Use padding: 0
Padding: 1
Padding: 1.2129999999999992
Padding: 1.224000000000005
Padding: 1.1750000000000131
Padding: 1.1150000000000215
Padding: 1.0430000000000303
Padding: 1
Padding: 1
Millis: 2641

// 1000 documents, power of 2 sizes allocation strategy, 7*300 document changes
> runPaddingTest(1000, true, 300, 7);
Use padding: 1
Padding: 1
Padding: 1.451999999999991
Padding: 1.8189999999999849
Padding: 1.999
Padding: 1.9980000000000002
Padding: 2
Padding: 1.999
Padding: 1.9980000000000002
Millis: 2730

// 1000 documents, exact allocation strategy, 7*300 document changes
> runPaddingTest(1000, false, 300, 7);
Use padding: 0
Padding: 1
Padding: 1.447999999999991
Padding: 1.784999999999986
Padding: 1.9980000000000002
Padding: 1.999
Padding: 2
Padding: 1.999
Padding: 1.999
Millis: 2728
like image 465
marcelka Avatar asked Oct 31 '22 13:10

marcelka


1 Answers

I don't think padding factor is relevant for usePowerOf2Sizes. For power of 2 sizes, the space allocated in bytes for a new spot for a record (for an insert or an update requiring a document move) is something like

2^ceiling(log_2(bsonsize(record))

i.e. the smallest power of two larger than the document size. For the exact fit allocation strategy, the space is

padding factor * bsonsize(document)

The padding factor is some number between 1 and 2 that MongoDB adjusts automatically based on how it sees documents in the collection grow. Note that it's not an empirical measure of how much padding, just some guess at how much padding should be used in the near future, and it's not relevant for power of 2 sizes. I don't know exactly how it is calculated - you'll have to read the code for that - but the fact that different allocation strategies lead to different record sizes lead to different patterns of documents on disk and patterns of document moves on update means it's not surprising the padding factor differs when doing the same operations on the same collection created with different allocation strategies.

like image 117
wdberkeley Avatar answered Nov 15 '22 07:11

wdberkeley