Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pull one instance of an item in an array in MongoDB?

According to the documents:

The $pull operator removes from an existing array all instances of a value or values that match a specified condition.

Is there an option to remove only the first instance of a value? For example:

var array = ["bird","tiger","bird","horse"]

How can the first "bird" be removed directly in an update call?

like image 204
anges244 Avatar asked Aug 14 '15 21:08

anges244


2 Answers

So you are correct in that the $pull operator does exactly what the documentation says in that it's arguments are in fact a "query" used to match the elements that are to be removed.

If your array content happened to always have the element in the "first" position as you show then the $pop operator does in fact remove that first element.

With the basic node driver:

collection.findOneAndUpdate(
    { "array.0": "bird" },       // "array.0" is matching the value of the "first" element 
    { "$pop": { "array": -1 } },
    { "returnOriginal": false },
    function(err,doc) {

    }
);

With mongoose the argument to return the modified document is different:

MyModel.findOneAndUpdate(
    { "array.0": "bird" },
    { "$pop": { "array": -1 } },
    { "new": true },
    function(err,doc) {

    }
);

But neither are of much use if the array position of the "first" item to remove is not known.

For the general approach here you need "two" updates, being one to match the first item and replace it with something unique to be removed, and the second to actually remove that modified item.

This is a lot more simple if applying simple updates and not asking for the returned document, and can also be done in bulk across documents. It also helps to use something like async.series in order to avoid nesting your calls:

async.series(
    [
        function(callback) {
            collection.update(
                { "array": "bird" },
                { "$unset": { "array.$": "" } },
                { "multi": true }
                callback
            );
        },
       function(callback) {
           collection.update(
                { "array": null },
                { "$pull": { "array": null } },
                { "multi": true }
                callback
           );
       }
    ],
    function(err) {
       // comes here when finished or on error   
    }
);

So using the $unset here with the positional $ operator allows the "first" item to be changed to null. Then the subsequent query with $pull just removes any null entry from the array.

That is how you remove the "first" occurance of a value safely from an array. To determine whether that array contains more than one value that is the same though is another question.

like image 76
Blakes Seven Avatar answered Sep 19 '22 05:09

Blakes Seven


It's worth noting that whilst the other answer here is indeed correct that the general approach here would be to $unset the matched array element in order to create a null value and then $pull just the null values from the array, there are better ways to implement this in modern MongoDB versions.

Using bulkWrite()

As an alternate case to submitting two operations to update in sequence as separate requests, modern MongoDB release support bulk operations via the recommended bulkWrite() method which allows those multiple updates to be submitted as a single request with a single response:

    collection.bulkWrite(
      [
        { "updateOne": {
          "filter": { "array": "bird" },
          "update": {
            "$unset": { "array.$": "" }
          }
        }},
        { "updateOne": {
          "filter": { "array": null },
          "update": {
            "$pull": { "array": null }
          }
        }}
      ]
    );

Does the same thing as the answer showing that as two requests, but this time it's just one. This can save a lot of overhead in server communication, so it's generally the better approach.

Using Aggregation Expressions

With the release of MongoDB 4.2, aggregation expressions are now allowed in the various "update" operations of MongoDB. This is a single pipeline stage of either $addFields, $set ( which is an alias of $addFields meant to make these "update" statements read more logically ), $project or $replaceRoot and it's own alias $replaceWith. The $redact pipeline stage also applies here to some degree. Basically any pipeline stage which returns a "reshaped" document is allowed.

collection.updateOne(
  { "array": "horse" },
  [
    { "$set": {
      "array": {
        "$concatArrays": [
          { "$slice": [ "$array", 0, { "$indexOfArray": [ "$array", "horse" ] }] },
          { "$slice": [
            "$array",
            { "$add": [{ "$indexOfArray": [ "$array", "horse" ] }, 1] },
            { "$size": "$array" }
          ]}
        ]
      }
    }}
  ]
);

In this case the manipulation used is to implement the $slice and $indexOfArray operators to essentially piece together a new array which "skips" over the first matched array element. Theses pieces are joined via the $concatArrays operator, returning a new array absent of the first matched element.

This is now probably more effective since the operation which is still a single request is now also a single operation and would incur a little less server overhead.

Of course the only catch is that this is not supported in any release of MongoDB prior to 4.2. The bulkWrite() on the other hand may be a newer API implementation, but the actual underlying calls to the server would apply back to MongoDB 2.6 implementing actual "Bulk API" calls, and even regresses back to earlier versions by the way all core drivers actually implement this method.

Demonstration

As a demonstration, here is a listing of both approaches:

const { Schema } = mongoose = require('mongoose');

const uri = 'mongodb://localhost:27017/test';
const opts = { useNewUrlParser: true, useUnifiedTopology: true };

mongoose.Promise = global.Promise;

mongoose.set('debug', true);
mongoose.set('useCreateIndex', true);
mongoose.set('useFindAndModify', false);


const arrayTestSchema = new Schema({
  array: [String]
});

const ArrayTest = mongoose.model('ArrayTest', arrayTestSchema);

const array = ["bird", "tiger", "horse", "bird", "horse"];

const log = data => console.log(JSON.stringify(data, undefined, 2));

(async function() {

  try {
    const conn = await mongoose.connect(uri, opts);

    await Promise.all(
      Object.values(conn.models).map(m => m.deleteMany())
    );

    await ArrayTest.create({ array });

    // Use bulkWrite update
    await ArrayTest.bulkWrite(
      [
        { "updateOne": {
          "filter": { "array": "bird" },
          "update": {
            "$unset": { "array.$": "" }
          }
        }},
        { "updateOne": {
          "filter": { "array": null },
          "update": {
            "$pull": { "array": null }
          }
        }}
      ]
    );

    log({ bulkWriteResult: (await ArrayTest.findOne()) });

    // Use agggregation expression
    await ArrayTest.collection.updateOne(
      { "array": "horse" },
      [
        { "$set": {
          "array": {
            "$concatArrays": [
              { "$slice": [ "$array", 0, { "$indexOfArray": [ "$array", "horse" ] }] },
              { "$slice": [
                "$array",
                { "$add": [{ "$indexOfArray": [ "$array", "horse" ] }, 1] },
                { "$size": "$array" }
              ]}
            ]
          }
        }}
      ]
    );

    log({ aggregateWriteResult: (await ArrayTest.findOne()) });

  } catch (e) {
    console.error(e);
  } finally {
    mongoose.disconnect();
  }


})();

And the output:

Mongoose: arraytests.deleteMany({}, {})
Mongoose: arraytests.insertOne({ array: [ 'bird', 'tiger', 'horse', 'bird', 'horse' ], _id: ObjectId("5d8f509114b61a30519e81ab"), __v: 0 }, { session: null })
Mongoose: arraytests.bulkWrite([ { updateOne: { filter: { array: 'bird' }, update: { '$unset': { 'array.$': '' } } } }, { updateOne: { filter: { array: null }, update: { '$pull': { array: null } } } } ], {})
Mongoose: arraytests.findOne({}, { projection: {} })
{
  "bulkWriteResult": {
    "array": [
      "tiger",
      "horse",
      "bird",
      "horse"
    ],
    "_id": "5d8f509114b61a30519e81ab",
    "__v": 0
  }
}
Mongoose: arraytests.updateOne({ array: 'horse' }, [ { '$set': { array: { '$concatArrays': [ { '$slice': [ '$array', 0, { '$indexOfArray': [ '$array', 'horse' ] } ] }, { '$slice': [ '$array', { '$add': [ { '$indexOfArray': [ '$array', 'horse' ] }, 1 ] }, { '$size': '$array' } ] } ] } } } ])
Mongoose: arraytests.findOne({}, { projection: {} })
{
  "aggregateWriteResult": {
    "array": [
      "tiger",
      "bird",
      "horse"
    ],
    "_id": "5d8f509114b61a30519e81ab",
    "__v": 0
  }
}

NOTE : The example listing is using mongoose, partly because it was referenced in the other answer given and partly to also demonstrate an important point with the aggregate syntax example. Note the code uses ArrayTest.collection.updateOne() since at the present release of Mongoose ( 5.7.1 at time of writing ) the aggregation pipeline syntax to such updates is being removed by the standard mongoose Model methods.

As such the .collection accessor can be used in order to get the underlying Collection object from the core MongoDB Node driver. This would be required until a fix is made to mongoose which allows this expression to be included.

like image 44
Neil Lunn Avatar answered Sep 19 '22 05:09

Neil Lunn