Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk update array of matching sub document in Mongodb

I am running on Mongodb 3.6. Below is the structure of my document, which stores monthly rate information for list of products:

{
  "_id": 12345,
  "_class": "com.example.ProductRates",
  "rates": [
    {
      "productId": NumberInt(1234),
      "rate": 100.0,
      "rateCardId": NumberInt(1),
      "month": NumberInt(201801)
    },
    {
      "productId": NumberInt(1234),
      "rate": 200.0,
      "rateCardId": NumberInt(1),
      "month": NumberInt(201802)
    },
    {
      "productId": NumberInt(1234),
      "rate": 400.0,
      "rateCardId": NumberInt(2),
      "month": NumberInt(201803)
    },
    {
      "productId": NumberInt(1235),
      "rate": 500.0,
      "rateCardId": NumberInt(1),
      "month": NumberInt(201801)
    },
    {
      "productId": NumberInt(1235),
      "rate": 234,
      "rateCardId": NumberInt(2),
      "month": NumberInt(201803)
    }
  ]
}

Any changes to the ratecard associated, will propagate updates to multiple sub documents in the 'rates' array.

Below are the changes that needs to be applied on the above document

{
    "productId" : NumberInt(1234), 
    "rate" : 400.0, 
    "rateCardId": NumberInt(1),
    "month" : NumberInt(201801)
}, 
{
    "productId" : NumberInt(1234), 
    "rate" : 500.0, 
    "rateCardId": NumberInt(1),
    "month" : NumberInt(201802)
}, 
{
    "productId" : NumberInt(1235), 
    "rate" : 700.0, 
    "rateCardId": NumberInt(1),
    "month" : NumberInt(201802)
}

Is there a way to update the subdocuments under the array 'rates', incrementally without loading the entire document into the memory, inorder to merge the changes? Lets say my identifiers for the sub documents are combination of rates.[].productId, rates.[].month and rates.[].rateCardId.

I am able to update multiple documents at once using $[<identifier>] in 3.6, but with same value.

db.avail.rates_copy.update(
  { "_id" : 12345 },
  { $set: { "rates.$[item].rate": 0  } },
  { multi: true, 
   arrayFilters: [ { "item.rateCardId": {$in: [ 1, 2]} } ]
  }
)

Whereas in my case, values will change between the documents based on the above mentioned identifier combinations, which comes from a different system.

Is there a way to say that, update all the sub-documents that matches with( productId, month and rateCardId) from the changeset, with new values.

like image 210
Kumaran Avatar asked Apr 16 '18 23:04

Kumaran


1 Answers

In the shortest answer, it's both "yes" and "no".

There is indeed a way to match individual array elements and update them with separate values in a single statement, since you can in fact provide "multiple" arrayFilters conditions and use those identifiers in your update statement.

The problem with your particular sample here is that one of the entries in your "change set" ( the last one ) does not actually match any array member that is currently present. The "presumed" action here would be to $push that new un-matched member into the array where it was not found. However that particular action cannot be done in a "single operation", but you can use bulkWrite() to issue "multiple" statements to cover that case.

Matching Different Array Conditions

Explaining that in points, consider the first two items in your "change set". You can apply a "single" update statement with multiple arrayFilters like this:

db.avail_rates_copy.updateOne(
  { "_id": 12345 },
  { 
    "$set": {
      "rates.$[one]": {
        "productId" : NumberInt(1234), 
        "rate" : 400.0, 
        "rateCardId": NumberInt(1),
        "month" : NumberInt(201801)
      },
      "rates.$[two]": {
        "productId" : NumberInt(1234), 
        "rate" : 500.0, 
        "rateCardId": NumberInt(1),
        "month" : NumberInt(201802)
      } 
    }
  },
  { 
    "arrayFilters": [
      {
        "one.productId": NumberInt(1234),
        "one.rateCardId": NumberInt(1),
        "one.month": NumberInt(201801)
      },
      {
        "two.productId": NumberInt(1234),
        "two.rateCardId": NumberInt(1),
        "two.month": NumberInt(201802)
      }
    ]
  }
)

If you ran that you would see the modified document becomes:

{
        "_id" : 12345,
        "_class" : "com.example.ProductRates",
        "rates" : [
                {                             // Matched and changed this by one
                        "productId" : 1234,
                        "rate" : 400,
                        "rateCardId" : 1,
                        "month" : 201801
                },
                {                            // And this as two
                        "productId" : 1234,
                        "rate" : 500,
                        "rateCardId" : 1,
                        "month" : 201802
                },
                {
                        "productId" : 1234,
                        "rate" : 400,
                        "rateCardId" : 2,
                        "month" : 201803
                },
                {
                        "productId" : 1235,
                        "rate" : 500,
                        "rateCardId" : 1,
                        "month" : 201801
                },
                {
                        "productId" : 1235,
                        "rate" : 234,
                        "rateCardId" : 2,
                        "month" : 201803
                }
        ]
}

Note here that you specify each "identfier" within the list of arrayFilters with multiple conditions to match the element like so:

  {
    "one.productId": NumberInt(1234),
    "one.rateCardId": NumberInt(1),
    "one.month": NumberInt(201801)
  },

So each "condition" effectively maps as:

  <identifier>.<property>

So it knows to be looking at the "rates" array by the statement in the update block by the $[<indentifier>] :

 "rates.$[one]"

And looks at each element of "rates" to match the conditions. So the "one" identifier would match the conditions prefixed with "one" and likewise for the other set of conditions prefixed with "two", therefore the actual update statement applies only to those which matches the conditions assigned to the identifier.

If you just wanted the "rates" property as opposed to the whole object, then you just notate as:

{ "$set": { "rates.$[one].rate": 400, "rates.$[two].rate": 500 } }

Adding Un-matched Objects

So the first part is relatively simple to comprehend, but as stated doing a $push for the "element which is not there" is a different matter, since we basically need a query condition on the "document" level in order to determine that an array element is "missing".

What this essentially means is that you need to issue an update with the $push looking for each array element to see if it exists or not. When it is not present, then the document is a match and the $push is performed.

This is where bulkWrite() comes into play, and you use it by adding an additional update to our first operation above for every element in the "change set":

db.avail_rates_copy.bulkWrite(
  [
    { "updateOne": {
      "filter": { "_id": 12345 },
      "update": {
        "$set": {
          "rates.$[one]": {
            "productId" : NumberInt(1234), 
            "rate" : 400.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201801)
          },
          "rates.$[two]": {
            "productId" : NumberInt(1234), 
            "rate" : 500.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201802)
          },
          "rates.$[three]": {
            "productId" : NumberInt(1235), 
            "rate" : 700.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201802)
          }
        }
      },
      "arrayFilters": [
        {
          "one.productId": NumberInt(1234),
          "one.rateCardId": NumberInt(1),
          "one.month": NumberInt(201801)
        },
        {
          "two.productId": NumberInt(1234),
          "two.rateCardId": NumberInt(1),
          "two.month": NumberInt(201802)
        },
        {
          "three.productId": NumberInt(1235),
          "three.rateCardId": NumberInt(1),
          "three.month": NumberInt(201802)
        }
      ]    
    }},
    { "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId" : NumberInt(1234), 
              "rateCardId": NumberInt(1),
              "month" : NumberInt(201801)
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId" : NumberInt(1234), 
            "rate" : 400.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201801)
          }
        }
      }
    }},
    { "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId" : NumberInt(1234), 
              "rateCardId": NumberInt(1),
              "month" : NumberInt(201802)
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId" : NumberInt(1234), 
            "rate" : 500.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201802)
          }
        }
      }
    }},
    { "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId" : NumberInt(1235),
              "rateCardId": NumberInt(1),
              "month" : NumberInt(201802)
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId" : NumberInt(1235),
            "rate" : 700.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201802)
          }
        }
      }
    }}
  ],
  { "ordered": true }
)

Note the $elemMatch withing the query filter, as this is a requirement to match an array element by "multiple conditions". We didn't need that on the arrayFilters entries because they only look at each array item they are applied to already, but as a "query" the conditions require $elemMatch as simple "dot notation" would return incorrect matches.

Also see the $not operator is used here to "negate" the $elemMatch, as our true conditions are to only match a document which "has not matching array element" to the provided conditions, and that is what justifies selection for appending a new element.

And that single statement issued to the server essentially attempts four update operations as one for attempting to update matched array elements, and another for each of the three "change sets" attempting to $push where the document was found to not match the conditions for the array element in the "change set".

The result is therefore as expected:

{
        "_id" : 12345,
        "_class" : "com.example.ProductRates",
        "rates" : [
                {                               // matched and updated
                        "productId" : 1234,
                        "rate" : 400,
                        "rateCardId" : 1,
                        "month" : 201801
                },
                {                               // matched and updated
                        "productId" : 1234,
                        "rate" : 500,
                        "rateCardId" : 1,
                        "month" : 201802
                },
                {
                        "productId" : 1234,
                        "rate" : 400,
                        "rateCardId" : 2,
                        "month" : 201803
                },
                {
                        "productId" : 1235,
                        "rate" : 500,
                        "rateCardId" : 1,
                        "month" : 201801
                },
                {
                        "productId" : 1235,
                        "rate" : 234,
                        "rateCardId" : 2,
                        "month" : 201803
                },
                {                              // This was appended
                        "productId" : 1235,
                        "rate" : 700,
                        "rateCardId" : 1,
                        "month" : 201802
                }
        ]
}

Depending on how many elements where actually un-matched the bulkWrite() response will report on how many of those statement actually matched and affected a document. In this case it's 2 matched and modified, since the "first" update operation matches existing array entries, and the "last" change update matches that the document does not contain the array entry and performs the $push to modify.

Conclusion

So there you have the combined approach, where:

  • The first part of "updating" in your question is very easy and can be done in a single statement, as is demonstrated in the first section.

  • The second part where there is an array element which "does not presently exist" within the current document array, this actually requires you use bulkWrite() in order to issue "multiple" operations in a single request.

Therefore update, is "YES" to a single operation. But adding difference means multiple operations. But you can combine the two approaches just as is demonstrated here.


There are many "fancy" ways in which you can construct these statements based on the "change set" array contents with code, so you don't need to "hardcode" each member.

As a basic case for JavaScript and compatible with the current release of the mongo shell ( which somewhat annoyingly does not support object spread operators ):

db.getCollection('avail_rates_copy').drop();
db.getCollection('avail_rates_copy').insert(
  {
    "_id" : 12345,
    "_class" : "com.example.ProductRates",
    "rates" : [
      {
        "productId" : 1234,
        "rate" : 100,
        "rateCardId" : 1,
        "month" : 201801
      },
      {
        "productId" : 1234,
        "rate" : 200,
        "rateCardId" : 1,
        "month" : 201802
      },
      {
        "productId" : 1234,
        "rate" : 400,
        "rateCardId" : 2,
        "month" : 201803
      },
      {
        "productId" : 1235,
        "rate" : 500,
        "rateCardId" : 1,
        "month" : 201801
      },
      {
        "productId" : 1235,
        "rate" : 234,
        "rateCardId" : 2,
        "month" : 201803
      }
    ]
  }
);

var changeSet = [
  {
      "productId" : 1234, 
      "rate" : 400.0, 
      "rateCardId": 1,
      "month" : 201801
  }, 
  {
      "productId" : 1234, 
      "rate" : 500.0, 
      "rateCardId": 1,
      "month" : 201802
  }, 
  {

      "productId" : 1235, 
      "rate" : 700.0, 
      "rateCardId": 1,
      "month" : 201802
  }
];

var arrayFilters = changeSet.map((obj,i) => 
  Object.keys(obj).filter(k => k != 'rate' )
    .reduce((o,k) => Object.assign(o, { [`u${i}.${k}`]: obj[k] }) ,{})
);

var $set = changeSet.reduce((o,r,i) =>
  Object.assign(o, { [`rates.$[u${i}].rate`]: r.rate }), {});

var updates = [
  { "updateOne": {
    "filter": { "_id": 12345 },
    "update": { $set },
    arrayFilters
  }},
  ...changeSet.map(obj => (
    { "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": Object.keys(obj).filter(k => k != 'rate')
              .reduce((o,k) => Object.assign(o, { [k]: obj[k] }),{})
          }
        }
      },
      "update": {
        "$push": {
          "rates": obj
        }
      }
    }}
  ))
];

db.getCollection('avail_rates_copy').bulkWrite(updates,{ ordered: true });

This will dynamically construct a list of "Bulk" update operations which would look like:

[
  {
    "updateOne": {
      "filter": {
        "_id": 12345
      },
      "update": {
        "$set": {
          "rates.$[u0].rate": 400,
          "rates.$[u1].rate": 500,
          "rates.$[u2].rate": 700
        }
      },
      "arrayFilters": [
        {
          "u0.productId": 1234,
          "u0.rateCardId": 1,
          "u0.month": 201801
        },
        {
          "u1.productId": 1234,
          "u1.rateCardId": 1,
          "u1.month": 201802
        },
        {
          "u2.productId": 1235,
          "u2.rateCardId": 1,
          "u2.month": 201802
        }
      ]
    }
  },
  {
    "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId": 1234,
              "rateCardId": 1,
              "month": 201801
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId": 1234,
            "rate": 400,
            "rateCardId": 1,
            "month": 201801
          }
        }
      }
    }
  },
  {
    "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId": 1234,
              "rateCardId": 1,
              "month": 201802
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId": 1234,
            "rate": 500,
            "rateCardId": 1,
            "month": 201802
          }
        }
      }
    }
  },
  {
    "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId": 1235,
              "rateCardId": 1,
              "month": 201802
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId": 1235,
            "rate": 700,
            "rateCardId": 1,
            "month": 201802
          }
        }
      }
    }
  }
]

Just like was described in the "long form" of the general answer, but of course simply uses the input "array" content in order to construct all of those statements.

You can do such dynamic object construction in any language, and all MongoDB drivers accept input of some type of structure you are allowed to "manipulate" which is then transformed to BSON before it's actually sent to the server for execution.

NOTE : The <identifier> for arrayFilters must consist of alpha-numeric characters and must begin with an alphabetical character. Hence whilst constructing the dynamic statement we prefix with "a" and then the current array index for each item being processed.

like image 113
Neil Lunn Avatar answered Sep 29 '22 23:09

Neil Lunn